VoiceRun Completions
primfunctions.completions is the managed LLM client VoiceRun agents use to call OpenAI, Anthropic, Google, Anthropic Vertex, and Alibaba models through a single interface. It ships as part of the primfunctions package and is available in every agent runtime — you do not install it separately.
Requests are routed through the completions proxy, a VoiceRun-operated service that:
- Holds the SDK clients and warm TLS connections for each provider
- Owns the provider API keys when you use VoiceRun-managed mode
- Handles streaming, sentence assembly for TTS, tool-call reassembly, retries, and fallbacks
- Emits standardized usage + trace telemetry for billing and observability
Your handler never speaks to the provider SDKs directly. You declare intent (provider, model, messages, optional tools, fallbacks, etc.), and the proxy executes it.
What you get#
- Unified request/response shape across every provider
- Streaming with optional sentence-boundary chunking designed for voice output
- Tool / function calling with JSON Schema, including cross-provider sanitization
- Retries with exponential backoff
- Fallbacks — primary provider fails → next provider takes over without another round-trip
- Prompt caching (Anthropic cache breakpoints)
- Structured output via
response_schema - Provider-specific kwargs (Anthropic thinking, OpenAI service tier, Google thinking config, Alibaba search, …)
Architecture#
handler code
│ generate_chat_completion / generate_chat_completion_stream
▼
primfunctions.completions (HTTP client)
│ POST /v1/completions (NDJSON for streaming)
▼
voicerun-completions-proxy
│ cached provider SDK clients
▼
OpenAI │ Anthropic │ Google │ Anthropic Vertex │ Alibaba
The proxy is a managed service. In local-dev, it runs as a Docker container in local-dev. In production, it runs on GKE.
Installation#
You do not install primfunctions.completions directly. It is already available in every VoiceRun agent runtime (sandbox subprocess and coderunner container) and in the primvoices-agents service.
For local development against a worktree of voicerun-python, use local-dev with the --local-python flag — see the local-dev README.
Session setup#
Before you make any completion call, register the providers you intend to use via configure_provider. Typically this happens in your StartEvent handler:
from primfunctions.completions import configure_provider from primfunctions.events import StartEvent async def handler(event, context): if isinstance(event, StartEvent): configure_provider("anthropic", voicerun_managed=True) configure_provider("openai", api_key=context.variables.get("OPENAI_API_KEY"))
voicerun_managed=Truetells the proxy to use VoiceRun's mounted API key for that provider — your handler never sees the key.api_key=...registers a customer-supplied key for this session only.
Calling generate_chat_completion for a provider that has not been registered raises CompletionsProviderNotConfiguredError. This is intentional: it prevents silent key leaks and makes the set of providers a session touches explicit.
Next steps#
- Basic Usage — your first completion
- Connection reuse & provider configuration —
configure_providerin depth - Streaming — real-time output for voice
- Tool calling — function calling patterns
- Reliability — retries and fallbacks
- Advanced features — caching, structured output, provider kwargs
- API reference — full type surface
