Development
The VoiceRun CLI ships tools for interactive debugging, automated testing, simulated calls, post-session evaluations, custom metrics, and A/B experiments — covering the full feedback loop on a voice agent.
Debugging#
Launch the Pipeline Debugger — a visual Electron app for real-time agent testing — or place an outbound phone call directly from the CLI:
vr debug
Inside a voicerun project this pushes your code and connects the debugger to the resulting function. Outside a project, the debugger opens with no prefilled agent.
Options#
| Flag | Description |
|---|---|
--skip-push, -s | Skip pushing code, use the existing deployment |
--environment, -e | Environment name (default: debug) |
--headless | Run without GUI — streams JSONL events to stdout, reads text input from stdin, exports session JSON on exit |
--output, -o | Output file path for headless session JSON (auto-generated if omitted) |
--script | Path to a JSON file with scripted messages (array of strings). Each message is sent one-per-turn automatically |
--outbound | Place an outbound phone call instead of launching the debugger |
--to-phone-number | Destination phone number in E.164 format (required with --outbound) |
--from-phone-number | Caller ID / originating phone number in E.164 format |
Interactive Controls#
- Enter — Send a text message to the agent
- Ctrl+C — End the session
Outbound Call Debugging#
Test your agent with a real phone call:
vr debug --outbound --to-phone-number +15551234567 vr debug --outbound --to-phone-number +15551234567 --from-phone-number +15559876543
Outbound calls require Node.js for the push step but skip the debugger GUI; they return a telephonyCallId you can inspect with vr session info.
Headless Debugging (CI / Coding Agents)#
--headless runs the debugger without a window, streams JSONL events to stdout, and reads text input from stdin (one message per line). Combine with --script to drive a fully scripted test run:
vr debug --headless # interactive over stdin vr debug --headless --output session.json # capture the session vr debug --headless --script test-messages.json -o out.json
The script file is a JSON array of strings — each entry is sent as one user message.
Testing#
Run pytest against your voice agent project:
vr test
vr test validates the project, installs dependencies, optionally fetches secrets from the specified environment, and runs pytest.
Options#
| Flag | Description |
|---|---|
--environment, -e | Environment to fetch secrets from |
--verbose, -v | Run pytest in verbose mode |
--coverage, -c | Run with coverage reporting |
--skip-install | Skip dependency installation |
Running Specific Tests#
vr test tests/test_handler.py vr test tests/test_handler.py --verbose --coverage
Passing Arguments to pytest#
Use -- to pass additional arguments directly to pytest:
vr test -- -k "test_greeting" --tb=short
LLM Completions in Tests#
Your handler's LLM completion calls — configure_provider and generate_chat_completion from primfunctions.completions — work under vr test just as they do in a deployed agent. When you exercise your handler with primfunctions' TestRunner, vr test wires the completions client up automatically.
Availability: completion support in
vr testis currently available to enterprise organizations. If it is not enabled for your organization,vr teststill runs your suite — completion calls just won't succeed.
Bring your own API key
In vr test, completions use your own provider API key — voicerun_managed=True is not available. Your handler reads the key from context.variables, exactly as in a deployed agent:
from primfunctions.completions import configure_provider, generate_chat_completion # In your handler: configure_provider("openai", api_key=context.variables.get("OPENAI_API_KEY")) response = await generate_chat_completion({ "provider": "openai", "model": "gpt-4.1-mini", "messages": [{"role": "user", "content": "Say hello"}], })
Get the key into the test process — either add it to the secrets for the environment you test against (fetched with --environment), or export it before running:
OPENAI_API_KEY=sk-... vr test
Then pass it to the runner as variables, so it reaches context.variables:
import os from primfunctions.test_runner import create_test_runner runner = create_test_runner( handler, variables={"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"]}, )
If your handler configures a provider with voicerun_managed=True, those completions won't run under vr test. To exercise that code path locally, have the handler use a key from context.variables when one is present and fall back to voicerun_managed=True otherwise.
Notes
- Sign in with
vr signinbefore runningvr test. - Because tests use your own API key, completion calls in
vr testcount against that provider account's usage.
Simulating an Agent#
vr simulate runs a Simulation resource (defined in .voicerun/templates/) against the active release for an agent and environment. The API pre-creates N agent sessions (origin: simulation) and drives each one as a Gemini-Live caller against the live /ws/entrypoint route.
vr simulate <ENVIRONMENT> # List available simulations vr simulate <ENVIRONMENT> --name happy-path # Run a single simulation vr simulate my-agent production --name happy-path # Explicit agent
Options#
| Flag | Description |
|---|---|
--name | Simulation resource name. If omitted, lists available simulations and exits |
--release, -r | Pin to a specific release ID (defaults to the latest release for the agent/environment) |
--values, -v | Values file in .voicerun/ to overlay for local listing/preview |
--wait | Block until every spawned session reaches a terminal status (completed or failed) |
--yes, -y | Skip the cost-guardrail confirmation prompt (fires when spec.numberOfSimulations > 10) |
The active release's manifest is authoritative for spec.systemPrompt and spec.numberOfSimulations — local edits not yet released are previewed but not used at run time.
Example Simulation resource#
apiVersion: voicerun/v1 kind: Simulation metadata: name: happy-path spec: systemPrompt: | You are a customer calling support. Ask to reset your password. numberOfSimulations: 5 voice: Aoede # optional Gemini-Live prebuilt voice
After submitting, the CLI prints the run ID and the spawned session IDs. Inspect with:
vr session info <id>— full session detailvr session transcript <id>— conversation transcriptvr metrics session <id>— custom metrics recorded during the run
Evaluations#
Evaluations are produced by Evaluator resources defined in .voicerun/templates/. Evaluators run server-side after each session completes; the CLI is read-only — define new evaluators by adding a kind: Evaluator document and shipping it via vr release.
vr evaluation list # List evaluations for the project agent vr evaluation list my-agent # Explicit agent vr evaluation list --session SESSION_ID # Only evaluations for one session vr evaluation list --status error --type judge # Filter vr evaluation info <EVALUATION_ID> # Full detail for one evaluation
List Options#
| Flag | Description |
|---|---|
--session, -S | Session ID to get evaluations for |
--status, -s | Filter by status (pending, complete, error, skipped) |
--type, -T | Filter by eval type (judge, extraction, deterministic, script) |
--limit, -l | Page size |
--page, -p | Page number |
--json, -j / --table, -t | Output format |
Example Evaluator resource#
apiVersion: voicerun/v1 kind: Evaluator metadata: name: resolution-judge spec: evalType: judge # judge | extraction | deterministic | script targetFormat: transcript # events | transcript systemPrompt: | Score the session 1-5 on whether the agent resolved the caller's request. responseSchema: {} successCriteria: {} apiProvider: google model: gemini-3.5-flash
vr evaluation info renders deterministic evaluation details, including the assertion predicate and structured match/failure details. Skipped evaluations show their skipReason so precondition failures are auditable without an LLM call.
Custom Metrics#
vr metrics queries custom metrics emitted by agents at runtime. Time-series queries use type-appropriate aggregation server-side.
vr metrics names # List metric names vr metrics names my-agent # Scoped to one agent vr metrics tags # Discover tag keys/values vr metrics tags --metric call_duration # Tags for one metric vr metrics timeseries call_duration \ --start 2026-05-15T00:00:00Z --end 2026-05-16T00:00:00Z --step 1h vr metrics session <SESSION_ID> # All metrics for one session
Most metric subcommands accept --agent/-a to scope results to a single agent (or fall back to agent.lock), and support --json / --table output.
A/B Experiments#
vr experiments reads experiment definitions and results. The agent is resolved from --agent (or positional argument) first, falling back to agent.lock.
vr experiments list # List experiments for the project agent vr experiments describe greeting_style # Variants, conversions, significance vr experiments funnel greeting_style # Per-variant funnel with lift vr experiments timeseries greeting_style \ --metric booking_completed \ --start 2026-05-01T00:00:00Z --end 2026-05-15T00:00:00Z --step 1d
vr experiments describe shows session count, conversion metrics, stop conditions (iteration and confidence thresholds), variant performance, and statistical significance (confidence, p-value).
Development Workflow#
A typical development cycle looks like this:
- Create a new project with
vr init - Write your agent handler in
handler.pyand declare runtime config in.voicerun/templates/deployment.yaml - Render templates with
vr renderto confirm the manifest looks right - Validate with
vr validate(orvr validate -e production) - Push your code with
vr push - Debug interactively with
vr debug, run scripted tests withvr debug --headless --script, or unit-test withvr test - Release to a non-prod environment with
vr release staging - Simulate regression scenarios with
vr simulate staging --name happy-path - Cut over by releasing to production and pointing an entrypoint:
vr release production --entrypoint support-line - Observe in production with
vr session list,vr metrics,vr experiments, andvr evaluation list
