VoiceRun Completions

primfunctions.completions is the managed LLM client VoiceRun agents use to call OpenAI, Anthropic, Google, Anthropic Vertex, and Alibaba models through a single interface. It ships as part of the primfunctions package and is available in every agent runtime — you do not install it separately.

Requests are routed through the completions proxy, a VoiceRun-operated service that:

Holds the SDK clients and warm TLS connections for each provider
Owns the provider API keys when you use VoiceRun-managed mode
Handles streaming, sentence assembly for TTS, tool-call reassembly, retries, and fallbacks
Emits standardized usage + trace telemetry for billing and observability

Your handler never speaks to the provider SDKs directly. You declare intent (provider, model, messages, optional tools, fallbacks, etc.), and the proxy executes it.

What you get#

Unified request/response shape across every provider
Streaming with optional sentence-boundary chunking designed for voice output
Tool / function calling with JSON Schema, including cross-provider sanitization
Retries with exponential backoff
Fallbacks — primary provider fails → next provider takes over without another round-trip
Prompt caching (Anthropic cache breakpoints)
Structured output via response_schema
Provider-specific kwargs (Anthropic thinking, OpenAI service tier, Google thinking config, Alibaba search, …)

Architecture#

handler code
    │  generate_chat_completion / generate_chat_completion_stream
    ▼
primfunctions.completions (HTTP client)
    │  POST /v1/completions  (NDJSON for streaming)
    ▼
voicerun-completions-proxy
    │  cached provider SDK clients
    ▼
OpenAI  │  Anthropic  │  Google  │  Anthropic Vertex  │  Alibaba

The proxy is a managed service. In local-dev, it runs as a Docker container in local-dev. In production, it runs on GKE.

Installation#

You do not install primfunctions.completions directly. It is already available in every VoiceRun agent runtime (sandbox subprocess and coderunner container) and in the primvoices-agents service.

For local development against a worktree of voicerun-python, use local-dev with the --local-python flag — see the local-dev README.

Session setup#

Before you make any completion call, register the providers you intend to use via configure_provider. Typically this happens in your StartEvent handler:

from primfunctions.completions import configure_provider
from primfunctions.events import StartEvent

async def handler(event, context):
    if isinstance(event, StartEvent):
        configure_provider("anthropic", voicerun_managed=True)
        configure_provider("openai", api_key=context.variables.get("OPENAI_API_KEY"))

voicerun_managed=True tells the proxy to use VoiceRun's mounted API key for that provider — your handler never sees the key.
api_key=... registers a customer-supplied key for this session only.

Calling generate_chat_completion for a provider that has not been registered raises CompletionsProviderNotConfiguredError. This is intentional: it prevents silent key leaks and makes the set of providers a session touches explicit.

Next steps#

Basic Usage — your first completion
Connection reuse & provider configuration — configure_provider in depth
Streaming — real-time output for voice
Tool calling — function calling patterns
Reliability — retries and fallbacks
Advanced features — caching, structured output, provider kwargs
API reference — full type surface