Development

The VoiceRun CLI ships tools for interactive debugging, automated testing, simulated calls, post-session evaluations, custom metrics, and A/B experiments — covering the full feedback loop on a voice agent.

Debugging#

Launch the Pipeline Debugger — a visual Electron app for real-time agent testing — or place an outbound phone call directly from the CLI:

vr debug

Inside a voicerun project this pushes your code and connects the debugger to the resulting function. Outside a project, the debugger opens with no prefilled agent.

Options#

Flag	Description
`--skip-push`, `-s`	Skip pushing code, use the existing deployment
`--environment`, `-e`	Environment name (default: `debug`)
`--headless`	Run without GUI — streams JSONL events to stdout, reads text input from stdin, exports session JSON on exit
`--output`, `-o`	Output file path for headless session JSON (auto-generated if omitted)
`--script`	Path to a JSON file with scripted messages (array of strings). Each message is sent one-per-turn automatically
`--outbound`	Place an outbound phone call instead of launching the debugger
`--to-phone-number`	Destination phone number in E.164 format (required with `--outbound`)
`--from-phone-number`	Caller ID / originating phone number in E.164 format

Interactive Controls#

Enter — Send a text message to the agent
Ctrl+C — End the session

Outbound Call Debugging#

Test your agent with a real phone call:

vr debug --outbound --to-phone-number +15551234567
vr debug --outbound --to-phone-number +15551234567 --from-phone-number +15559876543

Outbound calls require Node.js for the push step but skip the debugger GUI; they return a telephonyCallId you can inspect with vr session info.

Headless Debugging (CI / Coding Agents)#

--headless runs the debugger without a window, streams JSONL events to stdout, and reads text input from stdin (one message per line). Combine with --script to drive a fully scripted test run:

vr debug --headless                                     # interactive over stdin
vr debug --headless --output session.json               # capture the session
vr debug --headless --script test-messages.json -o out.json

The script file is a JSON array of strings — each entry is sent as one user message.

Testing#

Run pytest against your voice agent project:

vr test

vr test validates the project, installs dependencies, optionally fetches secrets from the specified environment, and runs pytest.

Options#

Flag	Description
`--environment`, `-e`	Environment to fetch secrets from
`--verbose`, `-v`	Run pytest in verbose mode
`--coverage`, `-c`	Run with coverage reporting
`--skip-install`	Skip dependency installation

Running Specific Tests#

vr test tests/test_handler.py
vr test tests/test_handler.py --verbose --coverage

Passing Arguments to pytest#

Use -- to pass additional arguments directly to pytest:

vr test -- -k "test_greeting" --tb=short

LLM Completions in Tests#

Your handler's LLM completion calls — configure_provider and generate_chat_completion from primfunctions.completions — work under vr test just as they do in a deployed agent. When you exercise your handler with primfunctions' TestRunner, vr test wires the completions client up automatically.

Availability: completion support in vr test is currently available to enterprise organizations. If it is not enabled for your organization, vr test still runs your suite — completion calls just won't succeed.

Bring your own API key

In vr test, completions use your own provider API key — voicerun_managed=True is not available. Your handler reads the key from context.variables, exactly as in a deployed agent:

from primfunctions.completions import configure_provider, generate_chat_completion

# In your handler:
configure_provider("openai", api_key=context.variables.get("OPENAI_API_KEY"))

response = await generate_chat_completion({
    "provider": "openai",
    "model": "gpt-4.1-mini",
    "messages": [{"role": "user", "content": "Say hello"}],
})

Get the key into the test process — either add it to the secrets for the environment you test against (fetched with --environment), or export it before running:

OPENAI_API_KEY=sk-... vr test

Then pass it to the runner as variables, so it reaches context.variables:

import os
from primfunctions.test_runner import create_test_runner

runner = create_test_runner(
    handler,
    variables={"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"]},
)

If your handler configures a provider with voicerun_managed=True, those completions won't run under vr test. To exercise that code path locally, have the handler use a key from context.variables when one is present and fall back to voicerun_managed=True otherwise.

Notes

Sign in with vr signin before running vr test.
Because tests use your own API key, completion calls in vr test count against that provider account's usage.

Simulating an Agent#

vr simulate runs a Simulation resource (defined in .voicerun/templates/) against the active release for an agent and environment. The API pre-creates N agent sessions (origin: simulation) and drives each one as a Gemini-Live caller against the live /ws/entrypoint route.

vr simulate <ENVIRONMENT>                          # List available simulations
vr simulate <ENVIRONMENT> --name happy-path        # Run a single simulation
vr simulate my-agent production --name happy-path  # Explicit agent

Options#

Flag	Description
`--name`	Simulation resource name. If omitted, lists available simulations and exits
`--release`, `-r`	Pin to a specific release ID (defaults to the latest release for the agent/environment)
`--values`, `-v`	Values file in `.voicerun/` to overlay for local listing/preview
`--wait`	Block until every spawned session reaches a terminal status (`completed` or `failed`)
`--yes`, `-y`	Skip the cost-guardrail confirmation prompt (fires when `spec.numberOfSimulations > 10`)

The active release's manifest is authoritative for spec.systemPrompt and spec.numberOfSimulations — local edits not yet released are previewed but not used at run time.

Example Simulation resource#

apiVersion: voicerun/v1
kind: Simulation
metadata:
  name: happy-path
spec:
  systemPrompt: |
    You are a customer calling support. Ask to reset your password.
  numberOfSimulations: 5
  voice: Aoede     # optional Gemini-Live prebuilt voice

After submitting, the CLI prints the run ID and the spawned session IDs. Inspect with:

vr session info <id> — full session detail
vr session transcript <id> — conversation transcript
vr metrics session <id> — custom metrics recorded during the run

Evaluations#

Evaluations are produced by Evaluator resources defined in .voicerun/templates/. Evaluators run server-side after each session completes; the CLI is read-only — define new evaluators by adding a kind: Evaluator document and shipping it via vr release.

vr evaluation list                                  # List evaluations for the project agent
vr evaluation list my-agent                         # Explicit agent
vr evaluation list --session SESSION_ID             # Only evaluations for one session
vr evaluation list --status error --type judge      # Filter
vr evaluation info <EVALUATION_ID>                  # Full detail for one evaluation

List Options#

Flag	Description
`--session`, `-S`	Session ID to get evaluations for
`--status`, `-s`	Filter by status (`pending`, `complete`, `error`, `skipped`)
`--type`, `-T`	Filter by eval type (`judge`, `extraction`, `deterministic`, `script`)
`--limit`, `-l`	Page size
`--page`, `-p`	Page number
`--json`, `-j` / `--table`, `-t`	Output format

Example Evaluator resource#

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: resolution-judge
spec:
  evalType: judge            # judge | extraction | deterministic | script
  targetFormat: transcript   # events | transcript
  systemPrompt: |
    Score the session 1-5 on whether the agent resolved the caller's request.
  responseSchema: {}
  successCriteria: {}
  apiProvider: google
  model: gemini-3.5-flash

vr evaluation info renders deterministic evaluation details, including the assertion predicate and structured match/failure details. Skipped evaluations show their skipReason so precondition failures are auditable without an LLM call.

Custom Metrics#

vr metrics queries custom metrics emitted by agents at runtime. Time-series queries use type-appropriate aggregation server-side.

vr metrics names                                                    # List metric names
vr metrics names my-agent                                           # Scoped to one agent
vr metrics tags                                                     # Discover tag keys/values
vr metrics tags --metric call_duration                              # Tags for one metric
vr metrics timeseries call_duration \
    --start 2026-05-15T00:00:00Z --end 2026-05-16T00:00:00Z --step 1h
vr metrics session <SESSION_ID>                                     # All metrics for one session

Most metric subcommands accept --agent/-a to scope results to a single agent (or fall back to agent.lock), and support --json / --table output.

A/B Experiments#

vr experiments reads experiment definitions and results. The agent is resolved from --agent (or positional argument) first, falling back to agent.lock.

vr experiments list                                # List experiments for the project agent
vr experiments describe greeting_style             # Variants, conversions, significance
vr experiments funnel greeting_style               # Per-variant funnel with lift
vr experiments timeseries greeting_style \
    --metric booking_completed \
    --start 2026-05-01T00:00:00Z --end 2026-05-15T00:00:00Z --step 1d

vr experiments describe shows session count, conversion metrics, stop conditions (iteration and confidence thresholds), variant performance, and statistical significance (confidence, p-value).

Development Workflow#

A typical development cycle looks like this:

Create a new project with vr init
Write your agent handler in handler.py and declare runtime config in .voicerun/templates/deployment.yaml
Render templates with vr render to confirm the manifest looks right
Validate with vr validate (or vr validate -e production)
Push your code with vr push
Debug interactively with vr debug, run scripted tests with vr debug --headless --script, or unit-test with vr test
Release to a non-prod environment with vr release staging
Simulate regression scenarios with vr simulate staging --name happy-path
Cut over by releasing to production and pointing an entrypoint: vr release production --entrypoint support-line
Observe in production with vr session list, vr metrics, vr experiments, and vr evaluation list