Development

The VoiceRun CLI ships tools for interactive debugging, automated testing, simulated calls, post-session evaluations, custom metrics, and A/B experiments — covering the full feedback loop on a voice agent.

Debugging#

Launch the Pipeline Debugger — a visual Electron app for real-time agent testing — or place an outbound phone call directly from the CLI:

vr debug

Inside a voicerun project this pushes your code and connects the debugger to the resulting function. Outside a project, the debugger opens with no prefilled agent.

Options#

FlagDescription
--skip-push, -sSkip pushing code, use the existing deployment
--environment, -eEnvironment name (default: debug)
--headlessRun without GUI — streams JSONL events to stdout, reads text input from stdin, exports session JSON on exit
--output, -oOutput file path for headless session JSON (auto-generated if omitted)
--scriptPath to a JSON file with scripted messages (array of strings). Each message is sent one-per-turn automatically
--outboundPlace an outbound phone call instead of launching the debugger
--to-phone-numberDestination phone number in E.164 format (required with --outbound)
--from-phone-numberCaller ID / originating phone number in E.164 format

Interactive Controls#

  • Enter — Send a text message to the agent
  • Ctrl+C — End the session

Outbound Call Debugging#

Test your agent with a real phone call:

vr debug --outbound --to-phone-number +15551234567 vr debug --outbound --to-phone-number +15551234567 --from-phone-number +15559876543

Outbound calls require Node.js for the push step but skip the debugger GUI; they return a telephonyCallId you can inspect with vr session info.

Headless Debugging (CI / Coding Agents)#

--headless runs the debugger without a window, streams JSONL events to stdout, and reads text input from stdin (one message per line). Combine with --script to drive a fully scripted test run:

vr debug --headless # interactive over stdin vr debug --headless --output session.json # capture the session vr debug --headless --script test-messages.json -o out.json

The script file is a JSON array of strings — each entry is sent as one user message.

Testing#

Run pytest against your voice agent project:

vr test

vr test validates the project, installs dependencies, optionally fetches secrets from the specified environment, and runs pytest.

Options#

FlagDescription
--environment, -eEnvironment to fetch secrets from
--verbose, -vRun pytest in verbose mode
--coverage, -cRun with coverage reporting
--skip-installSkip dependency installation

Running Specific Tests#

vr test tests/test_handler.py vr test tests/test_handler.py --verbose --coverage

Passing Arguments to pytest#

Use -- to pass additional arguments directly to pytest:

vr test -- -k "test_greeting" --tb=short

LLM Completions in Tests#

Your handler's LLM completion calls — configure_provider and generate_chat_completion from primfunctions.completions — work under vr test just as they do in a deployed agent. When you exercise your handler with primfunctions' TestRunner, vr test wires the completions client up automatically.

Availability: completion support in vr test is currently available to enterprise organizations. If it is not enabled for your organization, vr test still runs your suite — completion calls just won't succeed.

Bring your own API key

In vr test, completions use your own provider API keyvoicerun_managed=True is not available. Your handler reads the key from context.variables, exactly as in a deployed agent:

from primfunctions.completions import configure_provider, generate_chat_completion # In your handler: configure_provider("openai", api_key=context.variables.get("OPENAI_API_KEY")) response = await generate_chat_completion({ "provider": "openai", "model": "gpt-4.1-mini", "messages": [{"role": "user", "content": "Say hello"}], })

Get the key into the test process — either add it to the secrets for the environment you test against (fetched with --environment), or export it before running:

OPENAI_API_KEY=sk-... vr test

Then pass it to the runner as variables, so it reaches context.variables:

import os from primfunctions.test_runner import create_test_runner runner = create_test_runner( handler, variables={"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"]}, )

If your handler configures a provider with voicerun_managed=True, those completions won't run under vr test. To exercise that code path locally, have the handler use a key from context.variables when one is present and fall back to voicerun_managed=True otherwise.

Notes

  • Sign in with vr signin before running vr test.
  • Because tests use your own API key, completion calls in vr test count against that provider account's usage.

Simulating an Agent#

vr simulate runs a Simulation resource (defined in .voicerun/templates/) against the active release for an agent and environment. The API pre-creates N agent sessions (origin: simulation) and drives each one as a Gemini-Live caller against the live /ws/entrypoint route.

vr simulate <ENVIRONMENT> # List available simulations vr simulate <ENVIRONMENT> --name happy-path # Run a single simulation vr simulate my-agent production --name happy-path # Explicit agent

Options#

FlagDescription
--nameSimulation resource name. If omitted, lists available simulations and exits
--release, -rPin to a specific release ID (defaults to the latest release for the agent/environment)
--values, -vValues file in .voicerun/ to overlay for local listing/preview
--waitBlock until every spawned session reaches a terminal status (completed or failed)
--yes, -ySkip the cost-guardrail confirmation prompt (fires when spec.numberOfSimulations > 10)

The active release's manifest is authoritative for spec.systemPrompt and spec.numberOfSimulations — local edits not yet released are previewed but not used at run time.

Example Simulation resource#

apiVersion: voicerun/v1 kind: Simulation metadata: name: happy-path spec: systemPrompt: | You are a customer calling support. Ask to reset your password. numberOfSimulations: 5 voice: Aoede # optional Gemini-Live prebuilt voice

After submitting, the CLI prints the run ID and the spawned session IDs. Inspect with:

  • vr session info <id> — full session detail
  • vr session transcript <id> — conversation transcript
  • vr metrics session <id> — custom metrics recorded during the run

Evaluations#

Evaluations are produced by Evaluator resources defined in .voicerun/templates/. Evaluators run server-side after each session completes; the CLI is read-only — define new evaluators by adding a kind: Evaluator document and shipping it via vr release.

vr evaluation list # List evaluations for the project agent vr evaluation list my-agent # Explicit agent vr evaluation list --session SESSION_ID # Only evaluations for one session vr evaluation list --status error --type judge # Filter vr evaluation info <EVALUATION_ID> # Full detail for one evaluation

List Options#

FlagDescription
--session, -SSession ID to get evaluations for
--status, -sFilter by status (pending, complete, error, skipped)
--type, -TFilter by eval type (judge, extraction, deterministic, script)
--limit, -lPage size
--page, -pPage number
--json, -j / --table, -tOutput format

Example Evaluator resource#

apiVersion: voicerun/v1 kind: Evaluator metadata: name: resolution-judge spec: evalType: judge # judge | extraction | deterministic | script targetFormat: transcript # events | transcript systemPrompt: | Score the session 1-5 on whether the agent resolved the caller's request. responseSchema: {} successCriteria: {} apiProvider: google model: gemini-3.5-flash

vr evaluation info renders deterministic evaluation details, including the assertion predicate and structured match/failure details. Skipped evaluations show their skipReason so precondition failures are auditable without an LLM call.

Custom Metrics#

vr metrics queries custom metrics emitted by agents at runtime. Time-series queries use type-appropriate aggregation server-side.

vr metrics names # List metric names vr metrics names my-agent # Scoped to one agent vr metrics tags # Discover tag keys/values vr metrics tags --metric call_duration # Tags for one metric vr metrics timeseries call_duration \ --start 2026-05-15T00:00:00Z --end 2026-05-16T00:00:00Z --step 1h vr metrics session <SESSION_ID> # All metrics for one session

Most metric subcommands accept --agent/-a to scope results to a single agent (or fall back to agent.lock), and support --json / --table output.

A/B Experiments#

vr experiments reads experiment definitions and results. The agent is resolved from --agent (or positional argument) first, falling back to agent.lock.

vr experiments list # List experiments for the project agent vr experiments describe greeting_style # Variants, conversions, significance vr experiments funnel greeting_style # Per-variant funnel with lift vr experiments timeseries greeting_style \ --metric booking_completed \ --start 2026-05-01T00:00:00Z --end 2026-05-15T00:00:00Z --step 1d

vr experiments describe shows session count, conversion metrics, stop conditions (iteration and confidence thresholds), variant performance, and statistical significance (confidence, p-value).

Development Workflow#

A typical development cycle looks like this:

  1. Create a new project with vr init
  2. Write your agent handler in handler.py and declare runtime config in .voicerun/templates/deployment.yaml
  3. Render templates with vr render to confirm the manifest looks right
  4. Validate with vr validate (or vr validate -e production)
  5. Push your code with vr push
  6. Debug interactively with vr debug, run scripted tests with vr debug --headless --script, or unit-test with vr test
  7. Release to a non-prod environment with vr release staging
  8. Simulate regression scenarios with vr simulate staging --name happy-path
  9. Cut over by releasing to production and pointing an entrypoint: vr release production --entrypoint support-line
  10. Observe in production with vr session list, vr metrics, vr experiments, and vr evaluation list
clidebugtestsimulateevaluationmetricsexperimentsdevelopment