VN4000: Voice-Controlled Autonomous Robot with Raspberry Pi

New here? This part adds voice control to the autonomous C101 robot from Part 16 — say "go" and it drives, avoiding obstacles on its own; say "stop" and it freezes. The setup builds on Parts 12–16. If remote connection via SSH or VNC feels unfamiliar, Part 12 has the full walkthrough. Otherwise, let's go.

The Last Piece

Yes, we know. The five-step loop again. Bear with us — it's earned its place by now.

Learn to Move → Perception → Localization → Planning → Control → [repeat from Perception]

✅ Parts 13–16: the robot moves, senses obstacles, analyzes its surroundings, and steers intelligently.

But there's been something nagging. Every time we want the robot to run, we open a laptop, connect via VNC, find Thonny, press F5, and only then does anything happen. The robot is autonomous once it's running — but getting it running still requires a human hunched over a keyboard. Not exactly inspiring.

What if instead, you just set the robot on the floor, say "go", and walk away?

That's Part 17.

What We're Adding

A USB microphone (~$10 on Amazon) plugged into one of the Pi's USB ports. That's the only new hardware. Everything else is code.

The logic is straightforward: a background thread listens for voice commands continuously while the main loop handles driving and obstacle avoidance. Say "go" — the robot starts. Say "stop" — it freezes mid-track, no keyboard required.

Sounds simple. It was not simple.

The Debug Marathon (All 42km of It)

In theory, this part was just combining Part 7 (voice control) with Part 16 (autonomous driving). Copy, paste, done. In practice, that optimism lasted about thirty seconds.

km 0 — The robot ignores us completely

Plug in the USB mic, run the code, shout "GO" at the robot. Nothing. The C101 sits there, completely unbothered, judging us silently.

km 5 — At least the Pi sees the mic

arecord -l

card 1: Device [USB Audio Device], device 0: USB Audio (USB Audio)

Pi sees it. Card 1, device 0. Progress.

km 10 — "Device or resource busy"

arecord -D plughw:1,0 -f cd -t wav -d 5 test.wav

Error: Device or resource busy. Some other process has claimed the mic. Hunt down the culprits:

sudo fuser /dev/snd/*
sudo kill -9 [PID]

Kill everything. Try again.

km 15 — Sound! But barely

Recording works now. Playback works. The audio is so quiet it requires complete silence and mild telepathy to hear anything. Back to alsamixer, crank the capture volume to 90%.

km 20 — The mic isn't the default device

After a reboot, the Pi forgets the USB mic exists and defaults back to the built-in audio. Fix: create a permanent config file:

sudo nano /etc/asound.conf

pcm.!default {
    type hw
    card 1
}
ctl.!default {
    type hw
    card 1
}

Save, reboot. Now the USB mic is always the default.

km 30 — "No such file or directory"

aplay test.wav refuses to find a file that demonstrably exists. ls shows it right there. file test.wav confirms it's a valid WAVE file. aplay remains unconvinced.

We never solved this one. We moved on instead — which turned out to be the right call, because aplay was never actually needed for the robot to work.

km 35 — SpeechRecognition can't understand anything

Energy threshold auto-detected at 16,731. That's not a typo. The Pi's audio environment is apparently very loud (or the mic gain is still not quite right). Google hears... something. Just not words.

Fix: set the threshold manually and disable auto-adjustment:

recognizer.energy_threshold = 500
recognizer.dynamic_energy_threshold = False

km 42 — Finish line

Robot hears "go." Robot moves. Robot hears "stop." Robot stops.

The crowd goes wild. Or would, if anyone else were in the room.

Setup

Hardware: Plug a USB microphone into any USB port on the Raspberry Pi 3B. That's it for hardware.

Libraries:

sudo apt-get install -y python3-pyaudio portaudio19-dev flac
pip3 install SpeechRecognition

Note: flac is required for SpeechRecognition to encode audio before sending it to Google. Without it, the recognition fails silently.

Connect to the Pi via SSH or VNC as usual. Full details in Part 12.

The Code

Open Thonny via VNC (Menu → Programming → Thonny Python IDE). Activate the virtual environment first:

source my_project_env/bin/activate

Save the file as VoiceRobot.py inside the my_project_env folder:

import RPi.GPIO as GPIO
import time
import cv2
import numpy as np
from picamera2 import Picamera2
import speech_recognition as sr
import threading

# GPIO setup
GPIO.setmode(GPIO.BCM)
GPIO.setwarnings(False)
ENA = 18; IN1 = 27; IN2 = 22
IN3 = 23; IN4 = 24; ENB = 13
TRIG = 17; ECHO = 25

for pin in [ENA, ENB, IN1, IN2, IN3, IN4, TRIG]:
    GPIO.setup(pin, GPIO.OUT)
GPIO.setup(ECHO, GPIO.IN)

pwm_A = GPIO.PWM(ENA, 50)
pwm_B = GPIO.PWM(ENB, 50)
pwm_A.start(0)
pwm_B.start(0)

# Camera setup
picam = Picamera2()
picam.preview_configuration.main.format = 'RGB888'
picam.configure("preview")
picam.start()
time.sleep(2)

# Global state
is_running = False
speed = 50

# --- Robot functions ---
def get_distance():
    GPIO.output(TRIG, True)
    time.sleep(0.00001)
    GPIO.output(TRIG, False)
    start_time = stop_time = time.time()
    while GPIO.input(ECHO) == 0:
        start_time = time.time()
    while GPIO.input(ECHO) == 1:
        stop_time = time.time()
    return (stop_time - start_time) * 34300 / 2

def set_motor(in1, in2, pwm, speed):
    GPIO.output(in1, GPIO.HIGH if speed >= 0 else GPIO.LOW)
    GPIO.output(in2, GPIO.LOW if speed >= 0 else GPIO.HIGH)
    pwm.ChangeDutyCycle(abs(speed))

def forward(s): set_motor(IN1, IN2, pwm_A, s); set_motor(IN3, IN4, pwm_B, s)
def backward(s): set_motor(IN1, IN2, pwm_A, -s); set_motor(IN3, IN4, pwm_B, -s)
def turn_left(s): set_motor(IN1, IN2, pwm_A, -s); set_motor(IN3, IN4, pwm_B, s)
def turn_right(s): set_motor(IN1, IN2, pwm_A, s); set_motor(IN3, IN4, pwm_B, -s)
def stop_robot(): set_motor(IN1, IN2, pwm_A, 0); set_motor(IN3, IN4, pwm_B, 0)

def decide_direction():
    print("Obstacle detected! Analyzing with camera...")
    frame = picam.capture_array()
    frame = cv2.flip(frame, -1)  # Remove if camera is not mounted upside down
    gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
    edges = cv2.Canny(gray, 50, 150)
    h, w = edges.shape
    left_density = np.sum(edges[:, :w//2]) / 255
    right_density = np.sum(edges[:, w//2:]) / 255
    print(f"Edge density — Left: {left_density:.0f}, Right: {right_density:.0f}")
    if left_density < right_density:
        print("Left is clearer. Turning left!")
        return "left"
    else:
        print("Right is clearer. Turning right!")
        return "right"

# --- Voice command listener (runs in background thread) ---
def listen_to_commands():
    global is_running
    recognizer = sr.Recognizer()
    recognizer.energy_threshold = 500       # Set manually — auto-detection gave 16,731
    recognizer.dynamic_energy_threshold = False  # Disable auto-adjustment
    recognizer.pause_threshold = 0.5

    while True:
        try:
            with sr.Microphone() as source:
                print("Waiting for command...")
                audio = recognizer.listen(source, timeout=8, phrase_time_limit=4)

            command = recognizer.recognize_google(audio, language="en-US").lower()
            print(f"Heard: '{command}'")

            if "go" in command:
                is_running = True
                print("-> Go!")
            elif "stop" in command:
                is_running = False
                stop_robot()
                print("-> Stop!")

        except sr.WaitTimeoutError:
            pass  # No sound detected — keep listening
        except sr.UnknownValueError:
            pass  # Heard something but couldn't understand — keep listening
        except Exception as e:
            print(f"Error: {type(e).__name__}: {e}")

# --- Main loop ---
try:
    print("Voice control active. Say 'go' to start, 'stop' to halt. Ctrl+C to exit.")
    voice_thread = threading.Thread(target=listen_to_commands, daemon=True)
    voice_thread.start()

    while True:
        if is_running:
            dist = get_distance()
            print(f"Distance: {dist:.1f} cm")

            if dist > 20:
                forward(speed)
            else:
                stop_robot()
                backward(50)
                time.sleep(0.8)
                stop_robot()
                choice = decide_direction()
                if choice == "left":
                    turn_left(speed)
                else:
                    turn_right(speed)
                time.sleep(0.8)
                stop_robot()
                time.sleep(0.2)
        else:
            stop_robot()
            time.sleep(0.1)

except KeyboardInterrupt:
    print("Stopped.")
finally:
    stop_robot()
    pwm_A.stop()
    pwm_B.stop()
    GPIO.cleanup()
    picam.stop()

Press F5 to run. Nothing happens immediately — the robot is waiting for its orders.

Say "go". The C101 starts driving, avoiding obstacles, steering based on what the camera sees.

Say "stop". It freezes.

That's it. No keyboard. No laptop open in the corner. No F5. Just you, a $10 USB microphone, and a robot that does what it's told.

A Note on the Voice Thread

The code uses Python's threading module to run voice listening and robot driving simultaneously. Without it, the robot would have to stop everything and wait for a voice command before moving — which defeats the purpose entirely.

daemon=True means the voice thread shuts down automatically when the main program exits. No orphaned processes lurking in the background.

The is_running variable acts as the bridge between the two threads: the voice thread sets it to True or False, and the main loop reads it every 0.1 seconds to decide what to do next.

What We Just Built

Put it all together:

✅ Drives autonomously on its own power (Part 13)

✅ Stops and avoids obstacles with HC-SR04 (Part 14)

✅ Analyzes its surroundings with Pi Camera (Part 15)

✅ Chooses a turn direction based on what it sees (Part 16)

✅ Starts and stops on voice command (Part 17)

No laptop required at runtime. No remote control. No keyboard. You set it down, say a word, and it goes.

Is it Bumblebee? Not quite. But it's a robot — a real one, built from scratch, that moves and thinks and listens. That's not nothing.

Next up: Part 18 — WiFi Control Panel. A simple web interface to control the robot from any browser on the network.

Sunday, June 14, 2026

Voice-Controlled Autonomous Robot with Raspberry Pi — Part 17: Hey Robot, Move!