API Reference

Everything exported from primfunctions.completions.

from primfunctions.completions import (
    # Setup
    configure,
    configure_provider,
    close,
    # Top-level calls
    generate_chat_completion,
    generate_chat_completion_stream,
    # Errors
    CompletionsNotConfiguredError,
    CompletionsProviderNotConfiguredError,
    CompletionsProxyError,
    # Request / response
    ChatCompletionRequest,
    ChatCompletionResponse,
    FallbackRequest,
    RetryConfiguration,
    StreamOptions,
    ToolChoice,
    # Tools
    ToolDefinition,
    FunctionDefinition,
    ToolCall,
    FunctionCall,
    # Messages
    AssistantMessage,
    ConversationHistory,
    ConversationHistoryMessage,
    SystemMessage,
    ToolResultMessage,
    UserMessage,
    serialize_conversation,
    deserialize_conversation,
    # Cache
    CacheBreakpoint,
    # Streaming chunks
    ChatCompletionChunk,
    ContentDeltaChunk,
    ContentSentenceChunk,
    ErrorChunk,
    FinalResponseChunk,
    FinishReasonChunk,
    ToolCallChunk,
    UsageChunk,
    # Provider enum
    CompletionsProvider,
)

Setup#

`configure_provider`#

Register a provider for the current session. Must be called before any generate_chat_completion[_stream] for that provider — including fallback providers.

def configure_provider(
    provider: str | CompletionsProvider,
    *,
    voicerun_managed: bool = False,
    api_key: Optional[str] = None,
) -> None

voicerun_managed=True — the proxy uses VoiceRun's mounted key for this provider. Your handler never sees it.
api_key=<str> — register a customer-supplied key for this session only.

Exactly one of voicerun_managed or api_key must be truthy. Calling with both or neither raises ValueError.

The first call for a given provider also kicks off a background warm request to the proxy, paying the TLS handshake during a quiet moment (typically while your greeting is playing).

See Provider Configuration for details.

`configure`#

Set the proxy URL and auth token. You normally do not call this from a handler — the VoiceRun sandbox runtime configures it for you before your handler code runs.

def configure(proxy_url: str, proxy_token: str) -> None

`close`#

Close the module-level aiohttp session. Runtimes handle this during shutdown; handler code should not call it.

async def close() -> None

Top-level calls#

`generate_chat_completion`#

Send a non-streaming completion request. Returns the full ChatCompletionResponse.

async def generate_chat_completion(
    request: ChatCompletionRequest | dict,
) -> ChatCompletionResponse

On transport-level failures it retries once with a fresh connection. Retries configured via RetryConfiguration happen server-side on the proxy.

Raises:

CompletionsNotConfiguredError — configure(url, token) was never called in this process (the runtime should have done this).
CompletionsProviderNotConfiguredError — a provider in the request (primary or fallback) was not registered via configure_provider.
CompletionsProxyError — the proxy returned a non-200 status; the error carries message, error_type, and status_code.
ValueError — api_key was set on the request body (it must go through configure_provider).

`generate_chat_completion_stream`#

Send a streaming completion request. Returns an async iterable of ChatCompletionChunk.

async def generate_chat_completion_stream(
    request: ChatCompletionRequest | dict,
    stream_options: StreamOptions | dict | None = None,
) -> AsyncIterable[ChatCompletionChunk]

Connection retries fire before the first chunk only. Mid-stream errors are re-raised as CompletionsProxyError inside async for.

Errors#

class CompletionsNotConfiguredError(Exception): ...
class CompletionsProviderNotConfiguredError(Exception): ...
class CompletionsProxyError(Exception):
    message: str
    error_type: str
    status_code: Optional[int]

Request types#

`ChatCompletionRequest`#

Main request object.

@dataclass
class ChatCompletionRequest:
    provider: CompletionsProvider | str          # openai | anthropic | google | anthropic_vertex | alibaba
    model: str                                   # model id, e.g. "claude-haiku-4-5"
    messages: ConversationHistory | list[dict]   # conversation messages
    temperature: Optional[float] = None
    tools: Optional[list[ToolDefinition | dict]] = None
    tool_choice: Optional[ToolChoice] = None
    timeout: Optional[float] = None              # seconds
    max_tokens: Optional[int] = None
    response_schema: Optional[dict[str, Any]] = None  # JSON Schema for structured output
    retry: Optional[RetryConfiguration | dict] = None
    fallbacks: Optional[list[FallbackRequest | dict]] = None
    provider_kwargs: Optional[ProviderKwargs] = None

api_key is not a field. Provider keys come from configure_provider. Attempting to set api_key (or pass it in a dict request) raises ValueError.

`FallbackRequest`#

Override fields for a fallback provider. Everything unset inherits from the primary.

@dataclass
class FallbackRequest:
    provider: Optional[CompletionsProvider | str] = None
    model: Optional[str] = None
    messages: Optional[ConversationHistory] = None
    temperature: Optional[float] = None
    tools: Optional[list[ToolDefinition | dict]] = None
    tool_choice: Optional[ToolChoice] = None
    timeout: Optional[float] = None
    max_tokens: Optional[int] = None
    response_schema: Optional[dict[str, Any]] = None
    retry: Optional[RetryConfiguration | dict] = None
    provider_kwargs: Optional[ProviderKwargs] = None

api_key is also not a field on fallbacks. The proxy resolves keys from configure_provider per fallback entry.

`ProviderKwargs`#

Provider-specific kwargs keyed by provider name. Only the entry matching the executing provider is applied.

class ProviderKwargs(TypedDict, total=False):
    openai:           OpenAIKwargs
    anthropic:        AnthropicKwargs
    google:           GoogleKwargs
    anthropic_vertex: AnthropicVertexKwargs
    alibaba:          AlibabaKwargs

`OpenAIKwargs`

class OpenAIKwargs(TypedDict, total=False):
    service_tier: Literal["auto", "default", "flex", "scale", "priority"]
    reasoning_effort: Literal["low", "medium", "high"]

See the OpenAI API reference.

`AnthropicKwargs`

class AnthropicKwargs(TypedDict, total=False):
    thinking: ThinkingConfigParam  # e.g. {"type": "enabled", "budget_tokens": 10000}

See the Anthropic messages API reference.

`GoogleKwargs`

class GoogleKwargs(TypedDict, total=False):
    thinking_config: ThinkingConfigDict  # e.g. {"thinking_budget": 10000, "include_thoughts": True}

See the Google GenerateContent reference.

`AnthropicVertexKwargs`

class AnthropicVertexKwargs(TypedDict, total=False):
    thinking: ThinkingConfigParam
    project_id: str
    region: str
    service_account_credentials: dict[str, Any]  # parsed SA JSON

`AlibabaKwargs`

class AlibabaKwargs(TypedDict, total=False):
    base_url: str        # override regional endpoint
    enable_search: bool  # enable Qwen's built-in web search

Regional endpoints:

Singapore (default): https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Virginia (US): https://dashscope-us.aliyuncs.com/compatible-mode/v1
Beijing (CN): https://dashscope.aliyuncs.com/compatible-mode/v1

See the DashScope model reference.

`RetryConfiguration`#

@dataclass
class RetryConfiguration:
    enabled: bool = True
    max_retries: int = 3
    retry_delay: float = 1.0          # initial delay in seconds
    backoff_multiplier: float = 2.0   # exponential backoff factor

`StreamOptions`#

@dataclass
class StreamOptions:
    chunk_by_sentence: bool = False
    clean_sentences: bool = True
    min_sentence_length: int = 6
    punctuation_marks: Optional[list[str]] = None
    punctuation_language: Optional[str] = None   # en | zh | ko | ja | es | fr | it | de

Deprecated: stream_sentences is the previous name for chunk_by_sentence. It still works for backward compatibility but emits a deprecation warning via primfunctions.logger.

Response types#

`ChatCompletionResponse`#

@dataclass
class ChatCompletionResponse:
    message: AssistantMessage
    finish_reason: str                           # "stop" | "length" | "tool_calls" | ...
    usage: Optional[dict[str, Any]] = None       # provider-native token counts
    provider: Optional[str] = None               # the provider that actually produced this response
    model: Optional[str] = None
    request_id: Optional[str] = None             # proxy-generated request id

provider and model reflect the actual executor — useful when fallbacks fire and you want to log which provider served the turn.

Every message carries a vr_id field — an 8-character id auto-assigned on construction. It's preserved through serialize_conversation / deserialize_conversation so you can track individual messages across storage round-trips. You generally don't need to set it yourself.

`UserMessage`#

@dataclass
class UserMessage:
    content: str
    vr_id: str                                    # auto-generated
    name: Optional[str] = None
    cache_breakpoint: Optional[CacheBreakpoint] = None

`AssistantMessage`#

@dataclass
class AssistantMessage:
    content: Optional[str] = None
    vr_id: str                                    # auto-generated
    tool_calls: Optional[list[ToolCall]] = None
    cache_breakpoint: Optional[CacheBreakpoint] = None
    thought_signature: Optional[bytes] = None     # Google only; preserved automatically

`SystemMessage`#

Multiple SystemMessages are collapsed into one system block for Anthropic and Google.

@dataclass
class SystemMessage:
    content: str
    vr_id: str                                    # auto-generated
    cache_breakpoint: Optional[CacheBreakpoint] = None

`ToolResultMessage`#

The result of a tool call, fed back to the model on the next turn.

@dataclass
class ToolResultMessage:
    tool_call_id: str                             # matches ToolCall.id
    content: dict[str, Any]                       # JSON-serializable result
    vr_id: str                                    # auto-generated
    name: Optional[str] = None                    # function name
    cache_breakpoint: Optional[CacheBreakpoint] = None

`ConversationHistory`#

ConversationHistory = list[ConversationHistoryMessage]

A ConversationHistoryMessage is the union UserMessage | AssistantMessage | SystemMessage | ToolResultMessage.

Tool types#

`ToolDefinition`#

@dataclass
class ToolDefinition:
    type: Literal["function"]
    function: FunctionDefinition
    cache_breakpoint: Optional[CacheBreakpoint] = None

`FunctionDefinition`#

@dataclass
class FunctionDefinition:
    name: str
    description: str
    parameters: dict[str, Any]                    # JSON Schema
    strict: Optional[bool] = None                 # OpenAI strict-mode toggle

`ToolCall`#

Emitted by the model. Use .id when building the corresponding ToolResultMessage on the next turn.

@dataclass
class ToolCall:
    id: str
    type: Literal["function"]
    function: FunctionCall
    index: Optional[int] = None
    thought_signature: Optional[bytes] = None     # Google only

`FunctionCall`#

@dataclass
class FunctionCall:
    name: str
    arguments: dict[str, Any]                     # already JSON-parsed

`ToolChoice`#

ToolChoice = Union[Literal["none", "auto", "required"], str]

"auto" — model decides (default)
"none" — no tool calls
"required" — must call at least one tool
"<function_name>" — must call that specific function

Streaming types#

Every chunk has a .type: str property that lets you match on shape without importing every class:

async for chunk in stream:
    match chunk.type:
        case "content_delta":    ...   # ContentDeltaChunk
        case "content_sentence": ...   # ContentSentenceChunk
        case "tool_call":        ...   # ToolCallChunk
        case "finish_reason":    ...   # FinishReasonChunk
        case "usage":            ...   # UsageChunk
        case "response":         ...   # FinalResponseChunk
        case "error":            ...   # ErrorChunk (surfaces as CompletionsProxyError)

`ChatCompletionChunk`#

ChatCompletionChunk = Union[
    ContentDeltaChunk,
    ContentSentenceChunk,
    ToolCallChunk,
    FinishReasonChunk,
    UsageChunk,
    FinalResponseChunk,
    ErrorChunk,
]

`ContentDeltaChunk`#

Emitted when chunk_by_sentence=False. One per incremental token.

@dataclass
class ContentDeltaChunk:
    delta: str
    # type == "content_delta"

`ContentSentenceChunk`#

Emitted when chunk_by_sentence=True. One per complete sentence.

@dataclass
class ContentSentenceChunk:
    sentence: str
    # type == "content_sentence"

`ToolCallChunk`#

A fully-reassembled tool call. The proxy stitches together the streamed function-name + argument deltas before yielding.

@dataclass
class ToolCallChunk:
    tool_call: ToolCall
    # type == "tool_call"

`FinishReasonChunk`#

@dataclass
class FinishReasonChunk:
    finish_reason: str    # "stop" | "length" | "tool_calls" | ...
    # type == "finish_reason"

`UsageChunk`#

@dataclass
class UsageChunk:
    usage: dict[str, Any]
    # type == "usage"

`FinalResponseChunk`#

The last chunk of a successful stream. Carries the fully-assembled ChatCompletionResponse.

@dataclass
class FinalResponseChunk:
    response: ChatCompletionResponse
    # type == "response"

`ErrorChunk`#

Emitted by the proxy on a mid-stream failure. The library re-raises it as CompletionsProxyError inside async for, so you should not need to match it explicitly.

@dataclass
class ErrorChunk:
    error: str
    error_type: str
    # type == "error"

Enums#

`CompletionsProvider`#

class CompletionsProvider(StrEnum):
    OPENAI           = "openai"
    ANTHROPIC        = "anthropic"
    GOOGLE           = "google"
    ANTHROPIC_VERTEX = "anthropic_vertex"
    ALIBABA          = "alibaba"

Since it's a StrEnum, CompletionsProvider.ANTHROPIC == "anthropic". Pass either form wherever a provider is expected.

Cache types#

`CacheBreakpoint`#

@dataclass
class CacheBreakpoint:
    ttl: Literal["5m", "1h"] = "5m"

See Advanced features → Anthropic cache breakpoints for usage rules.

Utility functions#

`serialize_conversation`#

def serialize_conversation(conversation: ConversationHistory) -> list[dict[str, Any]]

`deserialize_conversation`#

def deserialize_conversation(data: list[dict[str, Any]]) -> ConversationHistory

Use these when round-tripping conversation history through storage that wants plain dicts (e.g. context.set_completion_messages / context.get_completion_messages).