Home assistant media player with voice input

I’ve been running piCorePlayer for a long time, first with LMS and more recently with Music Assistant. It’s really simple to set up and has many interesting features (like an optional built-in LMS) that I wasn’t using anyway.

I’ve been watching Home Assistant’s Year of Voice project with interest and was about to order an Atom Echo from M5Stack when I realized that the music players I already have scattered around the house should be able to process voice too without much trouble.

First step was to replace piCorePlayer with a Raspberry Pi OS so I’d be working with a more conventional Linux environment. I started with Raspberry Pi Imager to create an SD card. I installed the Lite version since I wasn’t planning on using the UI and configured SSH. Once it booted up I saw that it had already expanded storage so I was ready to start installing the player.

A couple of things I had to get out of the way first. I’m still using that cheap Ethernet adapter so I had to follow this guide first and reboot. Next up, I noticed that the squeezelite package wants to install all kinds of UI things. I thought it would be a good idea to disable all of that before I got started. Create a file named /etc/apt/apt.conf.d/99_norecommend and enter the following settings:

APT::Install-Recommends "false";
APT::AutoRemove::RecommendsImportant "false";
APT::AutoRemove::SuggestsImportant "false";

That will keep apt from installing recommended or suggested packages (like an X11 server).

Next, we can install squeezelite. You can type this all on one line or just cut/paste the whole block:

sudo apt-get update &&
sudo apt-get upgrade &&
sudo apt-get install squeezelite

You may want to go into /etc/default/squeezelite and see if you want to change anything. I changed the player name and increased the ALSA buffer size as recommended here.

Reboot and we should see the player.

The next step is to install Wyoming Satellite. Follow the instructions on that page. I’m using an old Raspberry Pi Zero (not even W). If you’re starting from scratch, the Zero 2W looks like a better choice since it can do local wake word detection.

When I tried adding the Audio Enhancements I hit the following error:

Traceback (most recent call last):
File ".../wyoming-satellite-master/script/run", line 12, in
subprocess.check_call([context.env_exe, "-m", "wyoming_satellite"] + sys.argv[1:])
File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['.../wyoming-satellite-master/.venv/bin/python3', '-m', 'wyoming_satellite', '--name', 'Wyoming Satellite', '--uri', 'tcp://0.0.0.0:10700', '--mic-command', 'arecord -r 16000 -c 1 -f S16_LE -t raw -D default:CARD=Device_1', '--snd-command', 'aplay -r 22050 -c 1 -f S16_LE -t raw', '--wake-uri', 'tcp://...:10400', '--wake-word-name', 'ok_nabu', '--mic-auto-gain', '10', '--mic-noise-suppression', '2']' died with ...

Seems that the binaries in the webrtc-noise-gain package aren’t compatible (at least with the Pi Zero). The solution was to build them which takes a rather long time…

sudo apt-get install python3-dev
. .venv/bin/activate
pip3 install –no-binary ‘:all:’ webrtc-noise-gain==1.2.3

It took a few hours but it works!

Play around with the audio enhancements settings to see what works for you.

I suggest adding an automation to pause the player as soon as the wake word is detected.

2 Replies to “Home assistant media player with voice input”

  1. Hi Dave!
    What Mic(s) are you actually using and how did you solve the problem that voice recognition becomes unusable if music is playing which gets picked up by the MIC(s) again.
    For example I tried to use the ReSpeaker Mic Array2.0 with USB.
    Unfortunately, I realized to late that it only offers 16kHz samling rate also for the speaker output via USB. The music on the provided LINE-OUT does not have HiFi quality. Sad.

    1. Hi!

      (Apologies for the delayed response. I’ve been juggling a number of projects and am just getting back to this now.)

      I’m using a real cheap mic, this one: https://www.aliexpress.us/item/3256805178134340.html. I don’t know yet whether I’m recommending it but it does work. I’m planning on experimenting with placement to see if that makes if more usable.

      I have an automation that silences the music as soon as the wake word is recognized. Even with the wake word detection running remotely it’s near instantaneous. It does work and command recognition works much better. It helps if I’m not too far from the mic. It looks like this but you’ll need to replace the IDs:

      
      alias: Mute on wake word
      description: ""
      trigger:
        - type: turned_on
          platform: device
          device_id: 
          entity_id: 
          domain: binary_sensor
      condition: []
      action:
        - metadata: {}
          data: {}
          target:
            area_id: 
          action: media_player.media_pause
        - wait_for_trigger:
            - type: turned_off
              platform: device
              device_id: 
              entity_id: 
              domain: binary_sensor
              for:
                hours: 0
                minutes: 0
                seconds: 0
          timeout:
            hours: 0
            minutes: 0
            seconds: 10
            milliseconds: 0
        - metadata: {}
          data: {}
          target:
            area_id: 
          action: media_player.media_play
      mode: single
      
      

      Someone should probably make this a template.

      Quality is a whole ‘nother story. What I have is good enough for me but I haven’t started down that rabbit hole.

      To be continued…

Leave a Reply

Your email address will not be published. Required fields are marked *