macOS · Fully Local · Open Source · Early Access

Voice Coding,
not Vibe Coding

Your AI agent doesn't just listen — it talks back.
Voice layer runs fully local. Open source. Zero cloud fees.

It codes. It talks. You decide.

Other voice tools are a monologue — you dictate, the machine types. HeyVox is a conversation. Your agent reads the code, does the work, and tells you what happened — like a colleague sitting next to you.

You choose the detail level: full response, a concise summary, or just the key facts. If something sounds off, pull up the full diff. If it sounds right, keep going — hands-free.

The voice layer is completely free — no API keys, no per-minute billing. STT and TTS run locally on your Mac. You bring your own AI agent.

1

Agent works, then speaks

Claude writes code, runs tests — then Herald reads you a summary of what changed.

2

You choose the verbosity

Full response, short summary, or one-liner — configurable per message or globally.

3

Sounds good? Keep going

Say your next instruction. No context switching, no reading walls of text.

4

Something off? Review the details

Pull up the full response anytime. You stay in control without losing flow.

Three layers, fully local

Voice in, voice out, and a HUD to see what's happening — all running on your Mac, zero cloud.

🔊

Voice OUT — Herald

Your agent finishes a task and speaks the result via Kokoro TTS. Choose verbosity: full response, summary, or one-liner. Emotional voice switching adapts tone to context. Hush auto-pauses YouTube & Spotify while it speaks.

🎙

Voice IN

Say the wake word (or hold push-to-talk) → HeyVox transcribes locally via MLX Whisper or sherpa-onnx → text is pasted directly into your agent. Works with any app.

🖥

HUD & Menu Bar

Menu bar icon shows state at a glance (idle, recording, transcribing, speaking). Frosted-glass overlay appears during recording with live waveform bars. Recent transcript history in the dropdown.

Works where you work

⚡ Claude Code
💫 Cursor
🌪 Windsurf
🔁 Continue.dev
➕ Any app with a text field

Voice IN works with any app — HeyVox pastes transcribed text into whatever window has focus. Voice OUT via Herald hooks works automatically with Claude Code. Other agents can use MCP (voice_speak) to speak back.

Everything you need, nothing you don't

🔊

Herald TTS Orchestration

Kokoro TTS on Metal GPU via mlx-audio. Multi-part streaming, audio ducking, and workspace-aware queue. Your agent speaks, you listen.

📊

Configurable Verbosity

Full response, summary, or one-liner. Per-message or global. Hear what matters, skip what doesn't.

🎭

Emotional Voice Switching

Detects mood in text (alert, cheerful, thoughtful) and picks the right voice. Auto-switches languages too.

Hush Media Control

Chrome extension pauses YouTube & Spotify during TTS. Falls back to MediaRemote for native apps.

👁

Wake Word Detection

Powered by openwakeword. Always listening locally, never to the cloud.

🗣

Local STT

MLX Whisper on Apple Silicon. sherpa-onnx fallback on Intel Macs. Fast, accurate, offline.

🎮

Push-to-Talk

Configurable key binding. Hold to record, release to transcribe.

HUD & Menu Bar

Menu bar icon with state indicator. Frosted-glass pill overlay during recording. Transcript history in dropdown.

🔧

MCP + Claude Hooks

4 MCP tools for any agent. Herald hooks for automatic Claude Code TTS. One heyvox setup wires everything.

💰

Free Voice Layer

No API keys for STT or TTS. No per-minute voice billing. Open source, runs on your hardware.

🚀

Auto-Start via launchd

Runs as a macOS launch agent. Always ready when you open your Mac.

🛡

Self-Healing

Dead mic recovery in 10s with exponential backoff. Bluetooth device filtering via CoreAudio. Memory watchdog auto-restarts at 1 GB.

Up and running in minutes

For early access testers. Requires macOS 14+, Python 3.12+, and Apple Silicon (Intel supported via sherpa-onnx).

# 1. Install system dependency
brew install portaudio

# 2. Clone and install
git clone https://github.com/heyvox-dev/heyvox.git
cd heyvox
pip install -e ".[apple-silicon,chrome]"

# 3. Run setup wizard
heyvox setup

The setup wizard walks you through permissions (Microphone, Accessibility), model download, mic test, config generation, launchd service, Herald TTS hooks for Claude Code, and MCP server registration — all in one guided flow.

All processing runs on your machine. No audio ever leaves.

HeyVox was designed privacy-first from day one. Zero telemetry. Zero cloud APIs for voice. No per-minute TTS or STT billing. Your voice data never leaves your machine.

openwakeword
On-device wake word detection, runs in real time
MLX Whisper
Apple Silicon speech recognition, fully offline
Kokoro TTS
Local neural text-to-speech, no API keys needed
sherpa-onnx
Intel fallback STT engine, also fully on-device

Choosing the right mic

Your microphone has a bigger impact on accuracy than the STT model.

Best

2.4 GHz USB Wireless Headsets

USB dongle headsets (Logitech, Jabra, EPOS) use their own USB audio device. Full-quality mic channel at all times, no OS audio switching, no echo issues.

Good

Wired Headset / Built-in Mic & Speakers

Wired headsets, the built-in Mac microphone, and built-in speakers all work reliably. Echo suppression mutes the mic during TTS when using speakers without a headset.

Good

Bluetooth for Playback + Built-in Mic

Use Bluetooth headphones for TTS playback (A2DP, full quality) while using the built-in Mac mic for voice input. Best of both worlds.

Works

Bluetooth Mic Mode

Bluetooth mic activates HFP mode (reduced quality), but HeyVox handles it: dead device filtering via CoreAudio, silent mic auto-recovery, and echo suppression. USB dongles are still better, but Bluetooth works.

Join the beta

HeyVox is in early access. We’re onboarding testers who use AI coding agents daily and want a real voice workflow. Tell us what agent you use and we’ll get you set up.

Request Access