Guides
Explore these guides to learn how to build different types of agents using the VoiceRun framework. Each guide demonstrates specific concepts and patterns you can apply to your own projects.
Simple Greeting Agent#
A basic greeting agent that demonstrates fundamental concepts like event handling and audio responses.
Features#
- Demonstrates the basic structure of an agent function
- Handles a simple greeting via StartEvent
- Handles user text input via TextEvent
- Uses TextToSpeechEvent to respond to the user with audio
Code Example#
from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent from primfunctions.context import Context async def handler(event: Event, context: Context): if isinstance(event, StartEvent): yield TextToSpeechEvent(text="Hello! I will repeat what you say.", voice="nova") if isinstance(event, TextEvent): user_message = event.data.get("text", "N/A") yield TextToSpeechEvent(text=f"You said: {user_message}", voice="nova")
How It Works#
Agent Initialization: The agent function is structured to handle different types of events. It listens for both session start and user input events.
Greeting the User: When a session begins (StartEvent), the agent responds with a simple spoken greeting using TextToSpeechEvent: "Hello! I will repeat what you say."
Echoing User Input: When the agent receives a TextEvent (user transcription and text input), it extracts the user's message and responds by repeating it back in the format: "You said: ...", again using TextToSpeechEvent.
Audio Responses: All responses from the agent are delivered as audio using the specified voice ('nova'), demonstrating how to use TextToSpeechEvent for output.
Key Takeaway#
This guide highlights the essential event handling and audio response patterns for building conversational agents: greeting the user, processing input, and replying with speech.
LLM Integration#
An agent that integrates with an LLM to answer user questions intelligently using the voicerun_completions library.
Features#
- Uses voicerun_completions for unified LLM access (supports Anthropic, OpenAI, Google)
- Streams responses sentence-by-sentence for low-latency speech
- Retrieves API keys from context variables
- Maintains conversation history across turns
- Simple, production-ready async pattern
Code Example#
from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent from primfunctions.context import Context from voicerun_completions import generate_chat_completion_stream, deserialize_conversation, UserMessage async def handler(event: Event, context: Context): if isinstance(event, StartEvent): yield TextToSpeechEvent( text="Hello! Ask me anything.", voice="nova" ) if isinstance(event, TextEvent): user_message = event.data.get("text", "N/A") # Get conversation history and add new message messages = deserialize_conversation(context.get_completion_messages()) messages.append(UserMessage(content=user_message)) # Stream the response stream = await generate_chat_completion_stream( request={ "provider": "anthropic", "api_key": context.variables.get("ANTHROPIC_API_KEY"), "model": "claude-haiku-4-5", "messages": messages }, stream_options={ "stream_sentences": True, "clean_sentences": True } ) async for chunk in stream: if chunk.type == "content_sentence": yield TextToSpeechEvent(text=chunk.sentence, voice="nova") elif chunk.type == "response": # Save conversation history messages.append(chunk.response.message) context.set_completion_messages(messages)
How It Works#
Conversation History: The agent retrieves previous messages using context.get_completion_messages() and deserializes them into typed message objects. This maintains context across conversation turns.
Streaming Request: The generate_chat_completion_stream function sends the conversation to the LLM and returns a stream. The stream_sentences option buffers tokens into complete sentences for natural speech output.
Sentence-by-Sentence Output: As each sentence completes, the agent yields a TextToSpeechEvent. This provides low-latency responses—users hear the first sentence while the LLM is still generating.
Saving History: When the stream completes, the response chunk contains the full assistant message, which is appended to the conversation history for future turns.
Switching Providers#
voicerun_completions supports multiple providers with the same interface:
# Anthropic request = { "provider": "anthropic", "api_key": context.variables.get("ANTHROPIC_API_KEY"), "model": "claude-haiku-4-5", "messages": messages } # OpenAI request = { "provider": "openai", "api_key": context.variables.get("OPENAI_API_KEY"), "model": "gpt-4o-mini", "messages": messages } # Google request = { "provider": "google", "api_key": context.variables.get("GOOGLE_API_KEY"), "model": "gemini-2.0-flash", "messages": messages }
Key Takeaway#
This guide shows how to build a conversational agent with streaming LLM responses using voicerun_completions. The unified interface makes it easy to switch providers, and sentence-based streaming provides a responsive user experience.
Tool Calling#
An agent that uses LLM tool calling to fetch data from external APIs before responding.
Features#
- Defines tools that the LLM can call to fetch external data
- Executes tool calls and returns results to the LLM
- Streams the final response after tool execution
- Maintains conversation history including tool interactions
Code Example#
from primfunctions.events import Event, StartEvent, TextEvent, TextToSpeechEvent from primfunctions.context import Context from voicerun_completions import ( generate_chat_completion, generate_chat_completion_stream, deserialize_conversation, UserMessage, ToolResultMessage, ) # Simulated external API calls async def get_weather(location: str) -> dict: # In production, call a real weather API return {"temperature": 72, "condition": "sunny", "location": location} async def get_stock_price(symbol: str) -> dict: # In production, call a real stock API return {"symbol": symbol, "price": 185.50, "change": "+2.35"} # Tool definitions TOOLS = [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name, e.g. San Francisco" } }, "required": ["location"] } } }, { "type": "function", "function": { "name": "get_stock_price", "description": "Get the current stock price for a ticker symbol", "parameters": { "type": "object", "properties": { "symbol": { "type": "string", "description": "Stock ticker symbol, e.g. AAPL" } }, "required": ["symbol"] } } } ] async def execute_tool(name: str, args: dict) -> dict: """Execute a tool by name and return the result.""" if name == "get_weather": return await get_weather(args["location"]) elif name == "get_stock_price": return await get_stock_price(args["symbol"]) else: return {"error": f"Unknown tool: {name}"} async def handler(event: Event, context: Context): if isinstance(event, StartEvent): yield TextToSpeechEvent( text="Hi! I can check the weather or stock prices for you.", voice="nova" ) if isinstance(event, TextEvent): user_message = event.data.get("text", "N/A") messages = deserialize_conversation(context.get_completion_messages()) messages.append(UserMessage(content=user_message)) # Tool execution loop - continues until LLM responds without tool calls max_iterations = 10 for _ in range(max_iterations): response = await generate_chat_completion({ "provider": "anthropic", "api_key": context.variables.get("ANTHROPIC_API_KEY"), "model": "claude-haiku-4-5", "messages": messages, "tools": TOOLS, "tool_choice": "auto" }) messages.append(response.message) # Speak any content the LLM returns if response.message.content: yield TextToSpeechEvent(text=response.message.content, voice="nova") # No tool calls - done with loop if not response.message.tool_calls: break # Execute tool calls and add results to messages for tool_call in response.message.tool_calls: result = await execute_tool( tool_call.function.name, tool_call.function.arguments ) messages.append(ToolResultMessage( tool_call_id=tool_call.id, name=tool_call.function.name, content=result )) context.set_completion_messages(messages)
How It Works#
Tool Definitions: Tools are defined as JSON schemas describing each function and its parameters. The LLM uses these to decide when to call tools.
Tool Execution Loop: The agent loops until the LLM responds without tool calls. This handles multi-step scenarios where the LLM needs several rounds of data fetching.
Tool Execution: For each tool call, the agent executes the function and adds the result as a ToolResultMessage. The loop continues with another LLM call that includes the results.
Final Response: Once the LLM has all the data it needs, it generates a natural language response incorporating the fetched information.
Key Takeaway#
Tool calling lets your agent fetch real-time data from external APIs, databases, or services. The LLM decides when tools are needed and how to incorporate the results into a natural response.
Background Tasks#
An agent that demonstrates how to run long-running operations in the background without blocking the main conversation flow.
Features#
- Demonstrates how to run long-running tasks in the background without blocking the main conversation
- Shows how to use context.create_task() to spawn background processes
- Illustrates proper task management and state tracking across background operations
- Demonstrates automatic task restart when previous tasks complete
- Shows how to maintain conversation flow while processing happens asynchronously
Code Example#
import asyncio import time import random from primfunctions.logger import logger async def background_task(context: Context): logger.info("Processing background task...") # Set initial state context.set_data("task_completed", False) # Do work... await asyncio.sleep(random.random() * 10) # Update state context.set_data("task_completed", True) context.set_data("completion_time", time.time()) logger.info("Background task done") async def handler(event: Event, context: Context): if isinstance(event, StartEvent): yield TextToSpeechEvent( text="Hello! I'll start processing your data in the background.", voice="brooke" ) context.create_task(background_task(context)) if isinstance(event, TextEvent): user_message = event.data.get("text", "N/A") if context.get_data("task_completed", False): completion_seconds_ago = int(time.time() - context.get_data("completion_time", 0)) yield TextToSpeechEvent( text=f"The data is done processing. Completion was {completion_seconds_ago} seconds ago.", voice="brooke" ) yield TextToSpeechEvent( text="Starting new task...", voice="brooke" ) context.create_task(background_task(context)) else: yield TextToSpeechEvent( text="The data is still processing.", voice="brooke" )
How It Works#
Background Task Definition: The background_task function uses primfunctions.logger to log progress. It's designed to run independently without blocking the main conversation flow. The function simulates work with random sleep times and tracks completion state.
Task Creation with context.create_task(): When the conversation starts, the handler calls context.create_task(background_task(context)) to spawn the background task. This immediately returns control to the main conversation while the background task runs asynchronously.
Non-Blocking Conversation Flow: The main conversation continues immediately after launching the background task. The agent can respond to user input, handle other requests, and maintain natural conversation flow while background processing happens simultaneously.
State Management Across Tasks: The background task uses context.set_data() to track task completion status and completion time. The main handler checks this state with context.get_data() to provide appropriate responses based on whether the task is still running or completed.
Task Restart Logic: When a task completes, the handler automatically starts a new background task. This demonstrates how to chain background operations and maintain continuous processing while keeping the conversation responsive.
Key Takeaway#
Background tasks are essential for creating responsive, professional agents that can handle complex workflows while maintaining natural conversation flow. Use them for any operation that might take more than a few seconds to complete.
When to Use Each Pattern#
Simple Greeting Agent#
Use when:
- You're just getting started with VoiceRun
- You need to understand basic event handling
- You want a foundation to build upon
- You're testing voice output and basic interactions
LLM Integration#
Use when:
- You need intelligent, context-aware responses
- You're building a conversational assistant
- You want to leverage LLM knowledge and reasoning
- You need natural language understanding
Tool Calling#
Use when:
- You need to fetch real-time data (weather, stocks, databases)
- You want the LLM to decide when to call external services
- You're building agents that interact with APIs
- You need to combine external data with natural conversation
Background Tasks#
Use when:
- You have long-running operations that shouldn't block conversation
- You're processing large datasets or files
- You have time-consuming calculations or transformations
- You need to poll for status updates or wait for external events
- You want to fire off work and let the user continue talking
Common Patterns#
Environment Variables#
All guides that need API keys or configuration use context.variables.get():
api_key = context.variables.get("OPENAI_API_KEY")
State Management#
Use context.set_data() and context.get_data() to maintain state across events:
context.set_data("user_name", "John") name = context.get_data("user_name", "Guest")
Event Handling Pattern#
All guides follow the same event handling structure:
async def handler(event: Event, context: Context): if isinstance(event, StartEvent): # Handle session start pass if isinstance(event, TextEvent): # Handle user input pass if isinstance(event, TimeoutEvent): # Handle timeout pass
Audio Output#
All guides use TextToSpeechEvent for voice responses:
yield TextToSpeechEvent(text="Response text", voice="nova")
Next Steps#
- Start with the Simple Greeting Agent to understand the basics
- Move to the LLM Integration guide to add intelligence
- Add Tool Calling to fetch external data
- Explore Background Tasks when you need async operations
- Combine patterns from multiple guides in your own agents
- Review the Context Reference for advanced features like A/B testing and outcomes
