Feb 5, 2026 ยท 4 min read

Voice Agents Are Here โ€” Talk to Your OpenClaw Bot

Send a voice note on Telegram. Your agent transcribes it, thinks, and responds with audio. Completely free. Here's the 5-minute setup.

The Stack (All Free)

๐ŸŽค

Groq Whisper

Speech-to-text. Free tier. Fastest transcription available.

๐Ÿง 

Any LLM

Your existing AI model processes the text normally.

๐Ÿ”Š

Edge TTS

Text-to-speech. 100+ voices. Completely free. Microsoft engine.

Config (2 Minutes)

// In openclaw.json
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [{ "provider": "groq", "model": "whisper-large-v3-turbo" }]
      }
    }
  },
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "edge",
      "edge": {
        "enabled": true,
        "voice": "en-US-MichelleNeural"
      }
    }
  }
}

Voice Choices

Edge TTS has 100+ voices across languages. Popular picks:

en-US-MichelleNeuralWarm, confident female voice. Great default.
en-US-GuyNeuralClear male voice. Professional tone.
en-GB-SoniaNeuralBritish female. Slightly formal.
en-US-AriaNeuralNatural, conversational female voice.

For premium voices, ElevenLabs offers the most natural-sounding AI voices (10k chars/mo free, then $5/mo+). Worth it if voice quality matters to your use case.

Full voice setup tutorial with troubleshooting:

Voice Setup Guide โ†’