Guides

Explore these guides to learn how to build different types of agents using the VoiceRun framework. Each guide demonstrates specific concepts and patterns you can apply to your own projects.

Simple Greeting Agent#

A basic greeting agent that demonstrates fundamental concepts like event handling and audio responses.

Features#

Demonstrates the basic structure of an agent function
Handles a simple greeting via StartEvent
Handles user text input via TextEvent
Uses TextToSpeechEvent to respond to the user with audio

Code Example#

from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent
from primfunctions.context import Context

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        yield TextToSpeechEvent(text="Hello! I will repeat what you say.", voice="nova")

    if isinstance(event, TextEvent):
        user_message = event.data.get("text", "N/A")

        yield TextToSpeechEvent(text=f"You said: {user_message}", voice="nova")

How It Works#

Agent Initialization: The agent function is structured to handle different types of events. It listens for both session start and user input events.

Greeting the User: When a session begins (StartEvent), the agent responds with a simple spoken greeting using TextToSpeechEvent: "Hello! I will repeat what you say."

Echoing User Input: When the agent receives a TextEvent (user transcription and text input), it extracts the user's message and responds by repeating it back in the format: "You said: ...", again using TextToSpeechEvent.

Audio Responses: All responses from the agent are delivered as audio using the specified voice ('nova'), demonstrating how to use TextToSpeechEvent for output.

Key Takeaway#

This guide highlights the essential event handling and audio response patterns for building conversational agents: greeting the user, processing input, and replying with speech.

LLM Integration#

An agent that integrates with an LLM to answer user questions intelligently using the voicerun_completions library.

Features#

Uses voicerun_completions for unified LLM access (supports Anthropic, OpenAI, Google)
Streams responses sentence-by-sentence for low-latency speech
Retrieves API keys from context variables
Maintains conversation history across turns
Simple, production-ready async pattern

Code Example#

from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent
from primfunctions.context import Context
from voicerun_completions import generate_chat_completion_stream, deserialize_conversation, UserMessage

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        yield TextToSpeechEvent(
            text="Hello! Ask me anything.",
            voice="nova"
        )

    if isinstance(event, TextEvent):
        user_message = event.data.get("text", "N/A")

        # Get conversation history and add new message
        messages = deserialize_conversation(context.get_completion_messages())
        messages.append(UserMessage(content=user_message))

        # Stream the response
        stream = await generate_chat_completion_stream(
            request={
                "provider": "anthropic",
                "api_key": context.variables.get("ANTHROPIC_API_KEY"),
                "model": "claude-haiku-4-5",
                "messages": messages
            },
            stream_options={
                "stream_sentences": True,
                "clean_sentences": True
            }
        )

        async for chunk in stream:
            if chunk.type == "content_sentence":
                yield TextToSpeechEvent(text=chunk.sentence, voice="nova")
            elif chunk.type == "response":
                # Save conversation history
                messages.append(chunk.response.message)
                context.set_completion_messages(messages)

How It Works#

Conversation History: The agent retrieves previous messages using context.get_completion_messages() and deserializes them into typed message objects. This maintains context across conversation turns.

Streaming Request: The generate_chat_completion_stream function sends the conversation to the LLM and returns a stream. The stream_sentences option buffers tokens into complete sentences for natural speech output.

Sentence-by-Sentence Output: As each sentence completes, the agent yields a TextToSpeechEvent. This provides low-latency responses—users hear the first sentence while the LLM is still generating.

Saving History: When the stream completes, the response chunk contains the full assistant message, which is appended to the conversation history for future turns.

Switching Providers#

voicerun_completions supports multiple providers with the same interface:

# Anthropic
request = {
    "provider": "anthropic",
    "api_key": context.variables.get("ANTHROPIC_API_KEY"),
    "model": "claude-haiku-4-5",
    "messages": messages
}

# OpenAI
request = {
    "provider": "openai",
    "api_key": context.variables.get("OPENAI_API_KEY"),
    "model": "gpt-4o-mini",
    "messages": messages
}

# Google
request = {
    "provider": "google",
    "api_key": context.variables.get("GOOGLE_API_KEY"),
    "model": "gemini-2.0-flash",
    "messages": messages
}

Key Takeaway#

This guide shows how to build a conversational agent with streaming LLM responses using voicerun_completions. The unified interface makes it easy to switch providers, and sentence-based streaming provides a responsive user experience.

Tool Calling#

An agent that uses LLM tool calling to fetch data from external APIs before responding.

Features#

Defines tools that the LLM can call to fetch external data
Executes tool calls and returns results to the LLM
Streams the final response after tool execution
Maintains conversation history including tool interactions

Code Example#

from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent
from primfunctions.context import Context
from voicerun_completions import (
    generate_chat_completion,
    generate_chat_completion_stream,
    deserialize_conversation,
    UserMessage,
    ToolResultMessage,
)

# Simulated external API calls
async def get_weather(location: str) -> dict:
    # In production, call a real weather API
    return {"temperature": 72, "condition": "sunny", "location": location}

async def get_stock_price(symbol: str) -> dict:
    # In production, call a real stock API
    return {"symbol": symbol, "price": 185.50, "change": "+2.35"}

# Tool definitions
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "Stock ticker symbol, e.g. AAPL"
                    }
                },
                "required": ["symbol"]
            }
        }
    }
]

async def execute_tool(name: str, args: dict) -> dict:
    """Execute a tool by name and return the result."""
    if name == "get_weather":
        return await get_weather(args["location"])
    elif name == "get_stock_price":
        return await get_stock_price(args["symbol"])
    else:
        return {"error": f"Unknown tool: {name}"}

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        yield TextToSpeechEvent(
            text="Hi! I can check the weather or stock prices for you.",
            voice="nova"
        )

    if isinstance(event, TextEvent):
        user_message = event.data.get("text", "N/A")

        messages = deserialize_conversation(context.get_completion_messages())
        messages.append(UserMessage(content=user_message))

        # Tool execution loop - continues until LLM responds without tool calls
        max_iterations = 10
        for _ in range(max_iterations):
            response = await generate_chat_completion({
                "provider": "anthropic",
                "api_key": context.variables.get("ANTHROPIC_API_KEY"),
                "model": "claude-haiku-4-5",
                "messages": messages,
                "tools": TOOLS,
                "tool_choice": "auto"
            })

            messages.append(response.message)

            # Speak any content the LLM returns
            if response.message.content:
                yield TextToSpeechEvent(text=response.message.content, voice="nova")

            # No tool calls - done with loop
            if not response.message.tool_calls:
                break

            # Execute tool calls and add results to messages
            for tool_call in response.message.tool_calls:
                result = await execute_tool(
                    tool_call.function.name,
                    tool_call.function.arguments
                )
                messages.append(ToolResultMessage(
                    tool_call_id=tool_call.id,
                    name=tool_call.function.name,
                    content=result
                ))

        context.set_completion_messages(messages)

How It Works#

Tool Definitions: Tools are defined as JSON schemas describing each function and its parameters. The LLM uses these to decide when to call tools.

Tool Execution Loop: The agent loops until the LLM responds without tool calls. This handles multi-step scenarios where the LLM needs several rounds of data fetching.

Tool Execution: For each tool call, the agent executes the function and adds the result as a ToolResultMessage. The loop continues with another LLM call that includes the results.

Final Response: Once the LLM has all the data it needs, it generates a natural language response incorporating the fetched information.

Key Takeaway#

Tool calling lets your agent fetch real-time data from external APIs, databases, or services. The LLM decides when tools are needed and how to incorporate the results into a natural response.

Background Tasks#

An agent that demonstrates how to run long-running operations in the background without blocking the main conversation flow.

Features#

Demonstrates how to run long-running tasks in the background without blocking the main conversation
Shows how to use context.create_task() to spawn background processes
Illustrates proper task management and state tracking across background operations
Demonstrates automatic task restart when previous tasks complete
Shows how to maintain conversation flow while processing happens asynchronously

Code Example#

import asyncio
import time
import random
from primfunctions.logger import logger

async def background_task(context: Context):
    logger.info("Processing background task...")

    # Set initial state
    context.set_data("task_completed", False)

    # Do work...
    await asyncio.sleep(random.random() * 10)

    # Update state
    context.set_data("task_completed", True)
    context.set_data("completion_time", time.time())

    logger.info("Background task done")

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        yield TextToSpeechEvent(
            text="Hello! I'll start processing your data in the background.",
            voice="brooke"
        )

        context.create_task(background_task(context))

    if isinstance(event, TextEvent):
        user_message = event.data.get("text", "N/A")

        if context.get_data("task_completed", False):
            completion_seconds_ago = int(time.time() - context.get_data("completion_time", 0))

            yield TextToSpeechEvent(
                text=f"The data is done processing. Completion was {completion_seconds_ago} seconds ago.",
                voice="brooke"
            )

            yield TextToSpeechEvent(
                text="Starting new task...",
                voice="brooke"
            )

            context.create_task(background_task(context))
        else:
            yield TextToSpeechEvent(
                text="The data is still processing.",
                voice="brooke"
            )

How It Works#

Background Task Definition: The background_task function uses primfunctions.logger to log progress. It's designed to run independently without blocking the main conversation flow. The function simulates work with random sleep times and tracks completion state.

Task Creation with context.create_task(): When the conversation starts, the handler calls context.create_task(background_task(context)) to spawn the background task. This immediately returns control to the main conversation while the background task runs asynchronously.

Non-Blocking Conversation Flow: The main conversation continues immediately after launching the background task. The agent can respond to user input, handle other requests, and maintain natural conversation flow while background processing happens simultaneously.

State Management Across Tasks: The background task uses context.set_data() to track task completion status and completion time. The main handler checks this state with context.get_data() to provide appropriate responses based on whether the task is still running or completed.

Task Restart Logic: When a task completes, the handler automatically starts a new background task. This demonstrates how to chain background operations and maintain continuous processing while keeping the conversation responsive.

You're just getting started with VoiceRun
You need to understand basic event handling
You want a foundation to build upon
You're testing voice output and basic interactions

LLM Integration#

Use when:

You need intelligent, context-aware responses
You're building a conversational assistant
You want to leverage LLM knowledge and reasoning
You need natural language understanding

Tool Calling#

Use when:

You need to fetch real-time data (weather, stocks, databases)
You want the LLM to decide when to call external services
You're building agents that interact with APIs
You need to combine external data with natural conversation

Background Tasks#

Use when:

You have long-running operations that shouldn't block conversation
You're processing large datasets or files
You have time-consuming calculations or transformations
You need to poll for status updates or wait for external events
You want to fire off work and let the user continue talking

Common Patterns#

Environment Variables#

All guides that need API keys or configuration use context.variables.get():

api_key = context.variables.get("OPENAI_API_KEY")

State Management#

Use context.set_data() and context.get_data() to maintain state across events:

context.set_data("user_name", "John")
name = context.get_data("user_name", "Guest")

Event Handling Pattern#

All guides follow the same event handling structure:

async def handler(event: Event, context: Context):
    if isinstance(event, StartEvent):
        # Handle session start
        pass

    if isinstance(event, TextEvent):
        # Handle user input
        pass

    if isinstance(event, TimeoutEvent):
        # Handle timeout
        pass

Audio Output#

All guides use TextToSpeechEvent for voice responses:

yield TextToSpeechEvent(text="Response text", voice="nova")

Next Steps#

Start with the Simple Greeting Agent to understand the basics
Move to the LLM Integration guide to add intelligence
Add Tool Calling to fetch external data
Explore Background Tasks when you need async operations
Combine patterns from multiple guides in your own agents
Review the Context Reference for advanced features like A/B testing and outcomes