Human‐Robot Interaction

Speech

Speaker and microphone in Jetson Xavier

For the robot's speaker, it is needed to run the node say, which is developed in ws/src/speech/Say.py.

To access the Jetson Xavier, connect to the robot's router, right now being RoBorregosHome with password RoBorregosHome2024. Verify your connection, and modify your IP to match the 192.168.31.XX pattern.

The Jetson Xavier has its static IP as 192.168.31.23, to access via SSH type:

ssh [email protected]

Right now, the HRI repo is located in home-hri directory. Navigate there and access the Docker container:

docker start home-hri 
make hri.shell

Inside the container, verify the correct output device by running:

python3 -m sounddevice

Review the outputs and identify the device with the name samplerate, ALSA (in my case 31):

...
31 samplerate, ALSA (0 in, 128 out)
...

Pass this device number into the environment variable of OUTPUT_DEVICE_INDEX:

export OUTPUT_DEVICE_INDEX=31

Testing speaker

After the previous setup steps, inside the docker container in the Jetson Xavier execute:

source ws/devel/setup.bash
roslaunch speech speech.launch

And in another terminal run:

source ws/devel/setup.bash
rostopic pub /robot_text std_msgs/String "Hi, my name is Frida! I'm here to help you with your domestic tasks"

Running speech locally (for Whisper)

Whisper is a speech recognition package to be executed in a personal device with GPU capabilities.

The setup and development has been simplified with Docker, but due to the use of speaker and microphone, some additional steps and customization are required for it to work.

Setup Docker container

Go to the root of HRI at /home/hri. Then using the Makefile run:

make hri.build.cuda

It would take some time to be ready. After finished building, run:

bash docker/scripts/speech.bash # Setup pulseaudio
sudo usermod -aG audio $USER # Make sure current user has access to audio resources.
sudo chmod 777 /dev/snd/* # Allow access to audio devices.

Before creating the container, fill the .env file based on the .env.example file which contanins instructions to select devices and filling variables. For my case, it was easier to execute the container for the required libraries and then remove it and fill the .env:

make hri.create.cuda
make hri.shell

Check input and output devices

Go to ws/src/speech/scripts and run TestSpeaker.py. A list of devices will display, select the one with Analog or the one with more stuff in its description, which in my case was 3.

Device 2: HD-Audio Generic: HDMI 2 (hw:0,8), 0 input channels, 8 output channels
Device 3: HD-Audio Generic: ALC294 Analog (hw:1,0), 2 input channels, 2 output channels
Device 4: hdmi, 0 input channels, 8 output channels

Then execute python3 -m sounddevice and look for the default device, in my case was 6.

  4 hdmi, ALSA (0 in, 8 out)
  5 pulse, ALSA (32 in, 32 out)
* 6 default, ALSA (32 in, 32 out)

Add the devices information to the .env file:

# Select Index for microphone and speaker
OUTPUT_DEVICE_INDEX=6
INPUT_DEVICE_INDEX=3

And exit the container and remove it to create it again passing the new environment variables:

make hri.down
make hri.remove
make hri.create.cuda
make hri.up
make hri.shell

Inside the recently created Docker container, source the ROS workspace, and run the speech launch file. It should load Whisper and start recognizing the speech:

source ws/devel/setup.bash
roslaunch speech speech.launch

Provide feedback

Saved searches