VoiceRun Transcribe

Real-time speech-to-text over WebSocket. Stream audio from a microphone or file and receive live transcription events including partial results, final transcripts, and speech activity detection.

How It Works#

The Transcribe API provides a WebSocket endpoint for streaming audio and receiving real-time transcription. The protocol flow is:

Connect — open a WebSocket with your API key
Configure — send session.update to select model, language, and prompt
Stream — send audio.append messages with base64 audio chunks
Receive — get transcription.partial and transcription.completed events
Close — send session.close to end the session

Authentication#

Create an API key in the VoiceRun Console, then pass it as a Bearer token in the WebSocket connection header.

import websockets

ws = await websockets.connect(
    "wss://transcribe.voicerun.com/ws",
    extra_headers={"Authorization": "Bearer YOUR_API_KEY"}
)

If the API key is invalid, the server closes the connection with code 4001 (Unauthorized).

Connection Flow#

Client                              Server
  |                                    |
  |  ---- WebSocket connect ------>    |
  |                                    |
  |  <---- session.created ----------  |  (server sends session ID)
  |                                    |
  |  ---- session.update ---------->   |  (client sends model config)
  |                                    |
  |  <---- session.updated ----------  |  (server confirms config)
  |                                    |
  |  ---- audio.append ------------>   |  (stream audio chunks)
  |  ---- audio.append ------------>   |
  |                                    |
  |  <---- speech.started -----------  |  (VAD detected speech)
  |  <---- transcription.partial ----  |  (interim result)
  |  <---- transcription.partial ----  |
  |  <---- transcription.completed --  |  (final transcript)
  |  <---- speech.stopped -----------  |  (VAD detected silence)
  |                                    |
  |  ---- session.close ----------->   |
  |  <---- session.closed -----------  |

Supported Models#

Provider	Model	Prompt / bias mechanism	Silence-based VAD	Advanced VAD
Deepgram	`nova-3`	Keyterm prompting	Yes	No
Deepgram	`flux-general-en`	Keyterm prompting	Yes	CSR (conversational)
Qwen	`qwen3-asr-flash-realtime`	Context (corpus text)	Yes	No
OpenAI	`gpt-4o-transcribe`	Context prompt	Optional	Semantic
OpenAI	`gpt-4o-mini-transcribe`	Context prompt	Optional	Semantic
OpenAI	`gpt-realtime`	Context prompt / session instructions	Yes	Semantic
Cartesia	`ink-whisper`	None	Yes	No
ElevenLabs	`scribe-v2-realtime`	Keyterm prompting	Yes	No
Soniox	`stt-rt-v4`	Context	No	Semantic

VoiceRun Transcribe

How It Works#

Authentication#

Connection Flow#

Supported Models#

Audio Format#

PCM16 (default)#

mulaw#