API Reference

Everything exported from primfunctions.completions.

from primfunctions.completions import ( # Setup configure, configure_provider, close, # Top-level calls generate_chat_completion, generate_chat_completion_stream, # Errors CompletionsNotConfiguredError, CompletionsProviderNotConfiguredError, CompletionsProxyError, # Request / response ChatCompletionRequest, ChatCompletionResponse, FallbackRequest, RetryConfiguration, StreamOptions, ToolChoice, # Tools ToolDefinition, FunctionDefinition, ToolCall, FunctionCall, # Messages AssistantMessage, ConversationHistory, ConversationHistoryMessage, SystemMessage, ToolResultMessage, UserMessage, serialize_conversation, deserialize_conversation, # Cache CacheBreakpoint, # Streaming chunks ChatCompletionChunk, ContentDeltaChunk, ContentSentenceChunk, ErrorChunk, FinalResponseChunk, FinishReasonChunk, ToolCallChunk, UsageChunk, # Provider enum CompletionsProvider, )

Setup#

configure_provider#

Register a provider for the current session. Must be called before any generate_chat_completion[_stream] for that provider — including fallback providers.

def configure_provider( provider: str | CompletionsProvider, *, voicerun_managed: bool = False, api_key: Optional[str] = None, ) -> None
  • voicerun_managed=True — the proxy uses VoiceRun's mounted key for this provider. Your handler never sees it.
  • api_key=<str> — register a customer-supplied key for this session only.

Exactly one of voicerun_managed or api_key must be truthy. Calling with both or neither raises ValueError.

The first call for a given provider also kicks off a background warm request to the proxy, paying the TLS handshake during a quiet moment (typically while your greeting is playing).

See Provider Configuration for details.

configure#

Set the proxy URL and auth token. You normally do not call this from a handler — the VoiceRun sandbox runtime configures it for you before your handler code runs.

def configure(proxy_url: str, proxy_token: str) -> None

close#

Close the module-level aiohttp session. Runtimes handle this during shutdown; handler code should not call it.

async def close() -> None

Top-level calls#

generate_chat_completion#

Send a non-streaming completion request. Returns the full ChatCompletionResponse.

async def generate_chat_completion( request: ChatCompletionRequest | dict, ) -> ChatCompletionResponse

On transport-level failures it retries once with a fresh connection. Retries configured via RetryConfiguration happen server-side on the proxy.

Raises:

  • CompletionsNotConfiguredErrorconfigure(url, token) was never called in this process (the runtime should have done this).
  • CompletionsProviderNotConfiguredError — a provider in the request (primary or fallback) was not registered via configure_provider.
  • CompletionsProxyError — the proxy returned a non-200 status; the error carries message, error_type, and status_code.
  • ValueErrorapi_key was set on the request body (it must go through configure_provider).

generate_chat_completion_stream#

Send a streaming completion request. Returns an async iterable of ChatCompletionChunk.

async def generate_chat_completion_stream( request: ChatCompletionRequest | dict, stream_options: StreamOptions | dict | None = None, ) -> AsyncIterable[ChatCompletionChunk]

Connection retries fire before the first chunk only. Mid-stream errors are re-raised as CompletionsProxyError inside async for.


Errors#

class CompletionsNotConfiguredError(Exception): ... class CompletionsProviderNotConfiguredError(Exception): ... class CompletionsProxyError(Exception): message: str error_type: str status_code: Optional[int]

Request types#

ChatCompletionRequest#

Main request object.

@dataclass class ChatCompletionRequest: provider: CompletionsProvider | str # openai | anthropic | google | anthropic_vertex | alibaba model: str # model id, e.g. "claude-haiku-4-5" messages: ConversationHistory | list[dict] # conversation messages temperature: Optional[float] = None tools: Optional[list[ToolDefinition | dict]] = None tool_choice: Optional[ToolChoice] = None timeout: Optional[float] = None # seconds max_tokens: Optional[int] = None response_schema: Optional[dict[str, Any]] = None # JSON Schema for structured output retry: Optional[RetryConfiguration | dict] = None fallbacks: Optional[list[FallbackRequest | dict]] = None provider_kwargs: Optional[ProviderKwargs] = None

api_key is not a field. Provider keys come from configure_provider. Attempting to set api_key (or pass it in a dict request) raises ValueError.

FallbackRequest#

Override fields for a fallback provider. Everything unset inherits from the primary.

@dataclass class FallbackRequest: provider: Optional[CompletionsProvider | str] = None model: Optional[str] = None messages: Optional[ConversationHistory] = None temperature: Optional[float] = None tools: Optional[list[ToolDefinition | dict]] = None tool_choice: Optional[ToolChoice] = None timeout: Optional[float] = None max_tokens: Optional[int] = None response_schema: Optional[dict[str, Any]] = None retry: Optional[RetryConfiguration | dict] = None provider_kwargs: Optional[ProviderKwargs] = None

api_key is also not a field on fallbacks. The proxy resolves keys from configure_provider per fallback entry.

ProviderKwargs#

Provider-specific kwargs keyed by provider name. Only the entry matching the executing provider is applied.

class ProviderKwargs(TypedDict, total=False): openai: OpenAIKwargs anthropic: AnthropicKwargs google: GoogleKwargs anthropic_vertex: AnthropicVertexKwargs alibaba: AlibabaKwargs

OpenAIKwargs

class OpenAIKwargs(TypedDict, total=False): service_tier: Literal["auto", "default", "flex", "scale", "priority"] reasoning_effort: Literal["low", "medium", "high"]

See the OpenAI API reference.

AnthropicKwargs

class AnthropicKwargs(TypedDict, total=False): thinking: ThinkingConfigParam # e.g. {"type": "enabled", "budget_tokens": 10000}

See the Anthropic messages API reference.

GoogleKwargs

class GoogleKwargs(TypedDict, total=False): thinking_config: ThinkingConfigDict # e.g. {"thinking_budget": 10000, "include_thoughts": True}

See the Google GenerateContent reference.

AnthropicVertexKwargs

class AnthropicVertexKwargs(TypedDict, total=False): thinking: ThinkingConfigParam project_id: str region: str service_account_credentials: dict[str, Any] # parsed SA JSON

AlibabaKwargs

class AlibabaKwargs(TypedDict, total=False): base_url: str # override regional endpoint enable_search: bool # enable Qwen's built-in web search

Regional endpoints:

  • Singapore (default): https://dashscope-intl.aliyuncs.com/compatible-mode/v1
  • Virginia (US): https://dashscope-us.aliyuncs.com/compatible-mode/v1
  • Beijing (CN): https://dashscope.aliyuncs.com/compatible-mode/v1

See the DashScope model reference.

RetryConfiguration#

@dataclass class RetryConfiguration: enabled: bool = True max_retries: int = 3 retry_delay: float = 1.0 # initial delay in seconds backoff_multiplier: float = 2.0 # exponential backoff factor

StreamOptions#

@dataclass class StreamOptions: chunk_by_sentence: bool = False clean_sentences: bool = True min_sentence_length: int = 6 punctuation_marks: Optional[list[str]] = None punctuation_language: Optional[str] = None # en | zh | ko | ja | es | fr | it | de

Deprecated: stream_sentences is the previous name for chunk_by_sentence. It still works for backward compatibility but emits a deprecation warning via primfunctions.logger.


Response types#

ChatCompletionResponse#

@dataclass class ChatCompletionResponse: message: AssistantMessage finish_reason: str # "stop" | "length" | "tool_calls" | ... usage: Optional[dict[str, Any]] = None # provider-native token counts provider: Optional[str] = None # the provider that actually produced this response model: Optional[str] = None request_id: Optional[str] = None # proxy-generated request id

provider and model reflect the actual executor — useful when fallbacks fire and you want to log which provider served the turn.


Message types#

Every message carries a vr_id field — an 8-character id auto-assigned on construction. It's preserved through serialize_conversation / deserialize_conversation so you can track individual messages across storage round-trips. You generally don't need to set it yourself.

UserMessage#

@dataclass class UserMessage: content: str vr_id: str # auto-generated name: Optional[str] = None cache_breakpoint: Optional[CacheBreakpoint] = None

AssistantMessage#

@dataclass class AssistantMessage: content: Optional[str] = None vr_id: str # auto-generated tool_calls: Optional[list[ToolCall]] = None cache_breakpoint: Optional[CacheBreakpoint] = None thought_signature: Optional[bytes] = None # Google only; preserved automatically

SystemMessage#

Multiple SystemMessages are collapsed into one system block for Anthropic and Google.

@dataclass class SystemMessage: content: str vr_id: str # auto-generated cache_breakpoint: Optional[CacheBreakpoint] = None

ToolResultMessage#

The result of a tool call, fed back to the model on the next turn.

@dataclass class ToolResultMessage: tool_call_id: str # matches ToolCall.id content: dict[str, Any] # JSON-serializable result vr_id: str # auto-generated name: Optional[str] = None # function name cache_breakpoint: Optional[CacheBreakpoint] = None

ConversationHistory#

ConversationHistory = list[ConversationHistoryMessage]

A ConversationHistoryMessage is the union UserMessage | AssistantMessage | SystemMessage | ToolResultMessage.


Tool types#

ToolDefinition#

@dataclass class ToolDefinition: type: Literal["function"] function: FunctionDefinition cache_breakpoint: Optional[CacheBreakpoint] = None

FunctionDefinition#

@dataclass class FunctionDefinition: name: str description: str parameters: dict[str, Any] # JSON Schema strict: Optional[bool] = None # OpenAI strict-mode toggle

ToolCall#

Emitted by the model. Use .id when building the corresponding ToolResultMessage on the next turn.

@dataclass class ToolCall: id: str type: Literal["function"] function: FunctionCall index: Optional[int] = None thought_signature: Optional[bytes] = None # Google only

FunctionCall#

@dataclass class FunctionCall: name: str arguments: dict[str, Any] # already JSON-parsed

ToolChoice#

ToolChoice = Union[Literal["none", "auto", "required"], str]
  • "auto" — model decides (default)
  • "none" — no tool calls
  • "required" — must call at least one tool
  • "<function_name>" — must call that specific function

Streaming types#

Every chunk has a .type: str property that lets you match on shape without importing every class:

async for chunk in stream: match chunk.type: case "content_delta": ... # ContentDeltaChunk case "content_sentence": ... # ContentSentenceChunk case "tool_call": ... # ToolCallChunk case "finish_reason": ... # FinishReasonChunk case "usage": ... # UsageChunk case "response": ... # FinalResponseChunk case "error": ... # ErrorChunk (surfaces as CompletionsProxyError)

ChatCompletionChunk#

ChatCompletionChunk = Union[ ContentDeltaChunk, ContentSentenceChunk, ToolCallChunk, FinishReasonChunk, UsageChunk, FinalResponseChunk, ErrorChunk, ]

ContentDeltaChunk#

Emitted when chunk_by_sentence=False. One per incremental token.

@dataclass class ContentDeltaChunk: delta: str # type == "content_delta"

ContentSentenceChunk#

Emitted when chunk_by_sentence=True. One per complete sentence.

@dataclass class ContentSentenceChunk: sentence: str # type == "content_sentence"

ToolCallChunk#

A fully-reassembled tool call. The proxy stitches together the streamed function-name + argument deltas before yielding.

@dataclass class ToolCallChunk: tool_call: ToolCall # type == "tool_call"

FinishReasonChunk#

@dataclass class FinishReasonChunk: finish_reason: str # "stop" | "length" | "tool_calls" | ... # type == "finish_reason"

UsageChunk#

@dataclass class UsageChunk: usage: dict[str, Any] # type == "usage"

FinalResponseChunk#

The last chunk of a successful stream. Carries the fully-assembled ChatCompletionResponse.

@dataclass class FinalResponseChunk: response: ChatCompletionResponse # type == "response"

ErrorChunk#

Emitted by the proxy on a mid-stream failure. The library re-raises it as CompletionsProxyError inside async for, so you should not need to match it explicitly.

@dataclass class ErrorChunk: error: str error_type: str # type == "error"

Enums#

CompletionsProvider#

class CompletionsProvider(StrEnum): OPENAI = "openai" ANTHROPIC = "anthropic" GOOGLE = "google" ANTHROPIC_VERTEX = "anthropic_vertex" ALIBABA = "alibaba"

Since it's a StrEnum, CompletionsProvider.ANTHROPIC == "anthropic". Pass either form wherever a provider is expected.


Cache types#

CacheBreakpoint#

@dataclass class CacheBreakpoint: ttl: Literal["5m", "1h"] = "5m"

See Advanced features → Anthropic cache breakpoints for usage rules.


Utility functions#

serialize_conversation#

def serialize_conversation(conversation: ConversationHistory) -> list[dict[str, Any]]

deserialize_conversation#

def deserialize_conversation(data: list[dict[str, Any]]) -> ConversationHistory

Use these when round-tripping conversation history through storage that wants plain dicts (e.g. context.set_completion_messages / context.get_completion_messages).

apireferencefunctionstypes