Declarative Resources

Anything under .voicerun/templates/ is rendered with Helm at vr release time and snapshotted onto the release manifest. Each YAML document declares one resource:

apiVersion: voicerun/v1
kind: Deployment | Simulation | Webhook | Evaluator
metadata:
  name: <unique within the manifest>
spec:
  ...

Preview rendered output with vr render; validate spec shape with vr validate. Values referenced as {{ .Values.foo }} come from .voicerun/values.yaml (and overlays); {{ .Agent.Name }} and friends come from .voicerun/agent.yaml. Secrets are referenced as {{ Secrets.organization.NAME }} — Helm leaves the placeholder intact and the API resolves it at session start.

metadata.name must be unique within a manifest for each kind. It's the handle used by other commands (e.g. vr simulate --name <…>, vr evaluation list --type <…>).

Deployment#

Runtime configuration for the agent in an environment. Every field is optional except kind/metadata.name — omitted fields take platform defaults. The mode field decides whether handler.py is required at the project root.

apiVersion: voicerun/v1
kind: Deployment
metadata:
  name: my-agent-deployment
spec:
  mode: coderunner            # 'coderunner' (handler.py sandbox) or 'relay'
  region: us-central1-a
  variables:
    LOG_LEVEL: info
    FEATURE_FLAG: "true"
  stt:
    model: flux-general-en
    language: en
    failover:
      model: nova-3
  turnTaking:
    mode: smart_turn
    externalEndpointing: 300
    smartTurnVadStopSecs: 0.4
    smartTurnStopSecs: 3.0
    smartTurnTimeout: 5.0
  tts:
    provider: cartesia
    model: sonic-2
    voice: lyric
    language: en
    speed: 1.0
  relay:
    url: wss://my-relay.example.com/ws/agent
  recording:
    enabled: false
    location: gs://my-bucket/recordings/
  redaction:
    enabled: false
  tracing:
    enabled: true

Top-level fields#

Field	Type	Description
`mode`	`coderunner` \| `relay`	Runtime mode. `coderunner` (default) runs your `handler.py` in the sandbox. `relay` runs in voicerun-relay — no `handler.py` is required and `vr validate` skips the handler check automatically.
`region`	string	Cluster region (e.g. `us-central1-a`).
`variables`	map<string, string>	Values injected into `context.variables` at session start. Merged with the org/agent-environment variable scopes.
`relay`	object	Relay endpoint config. Only used when `mode: relay`.
`stt`	object	Speech-to-text config.
`turnTaking`	object	Turn-taking strategy. Sibling of `stt`/`tts` because the signal can come from STT (provider EoT), raw audio (Silero VAD, Smart Turn V3), or — eventually — semantic analyzers.
`tts`	object	Text-to-speech config.
`recording`	object	Call recording config.
`redaction`	object	PII redaction applied to traces and session events.
`tracing`	object	Distributed tracing for the call pipeline.

`spec.relay`#

Only honored when mode: relay. Optional failover swaps to a backup relay endpoint on connection errors.

Field	Type	Description
`url`	string	Primary relay WebSocket URL (e.g. `wss://relay.example.com/ws/agent`).
`failover.url`	string	Backup relay WebSocket URL.

`spec.stt`#

Field	Type	Description
`model`	string	STT model identifier (e.g. `flux-general-en`, `nova-3`).
`language`	string	BCP-47 language code (e.g. `en`).
`prompt`	string	Prompt biasing for providers that support it.
`filter`	string	Provider-specific filter string.
`endpointing`	number	Provider-side endpointing silence threshold (ms).
`audioInputDelay`	number	Audio input delay (ms) before transcription starts.
`noiseReductionType`	string	Provider noise-reduction profile.
`eot.threshold`	number	Deepgram-style end-of-turn confidence threshold.
`eot.timeoutMs`	number	Hard cap on EoT detection (ms).
`eot.eagerThreshold`	number	Eager-EoT pre-confirmation threshold.
`vad.mode`	`server_vad` \| `semantic_vad`	OpenAI Realtime / Qwen3 VAD mode.
`vad.eagerness`	`auto` \| `low` \| `medium` \| `high`	OpenAI semantic-VAD eagerness.
`failover`	object	Same shape as `stt` (minus `failover`). Used on provider errors.

`spec.turnTaking`#

Field	Type	Description
`mode`	`provider` \| `silero` \| `smart_turn`	How turn boundaries are decided. `provider` uses the STT provider's built-in endpointing. `silero` runs local Silero VAD on the relay/agent. `smart_turn` runs Silero VAD + Smart Turn V3 ML.
`externalEndpointing`	number	Silence-stop threshold (ms) for `silero` / `smart_turn`.
`smartTurnVadStopSecs`	number	`smart_turn` only — VAD silence-stop window before the ML gates. Default `0.4`.
`smartTurnStopSecs`	number	`smart_turn` only — ML's per-window timeout. Default `3.0`.
`smartTurnTimeout`	number	`smart_turn` only — hard cap on the ML's running window. Default `5.0`.

`spec.tts`#

Field	Type	Description
`provider`	string	TTS provider (e.g. `cartesia`, `elevenlabs`).
`model`	string	Provider-specific model id (e.g. `sonic-2`).
`voice`	string	Provider-specific voice id (e.g. `lyric`).
`language`	string	BCP-47 language code.
`speed`	number	Speech rate (provider-specific scale).
`failover`	object	Same shape as `tts` (minus `failover`). Used on provider errors.

`spec.recording`#

Field	Type	Description
`enabled`	boolean	Record the call audio. Default `false`.
`location`	string	Storage URI override (e.g. `gs://my-bucket/recordings/`). Leave unset to use the platform default.

`spec.redaction`#

Field	Type	Description
`enabled`	boolean	Apply PII redaction to traces and session events. Default `false`.

`spec.tracing`#

Field	Type	Description
`enabled`	boolean	Emit distributed traces for the call pipeline. Default `true`.

Simulation#

A simulated caller used by vr simulate. The CLI submits the simulation's metadata.name; the API resolves spec from the active release's manifest, so the version that runs is always the released one, not whatever is on disk.

apiVersion: voicerun/v1
kind: Simulation
metadata:
  name: happy-path
spec:
  direction: inbound
  systemPrompt: |
    You are a customer calling this agent. Keep turns short and realistic.
  inputData:
    accountTier: premium
  numberOfSimulations: 5
  provider: gemini_live
  model: gemini-3.1-flash-live-preview
  voice: Aoede
  # Optional: pin delivery variation so a simulated caller is reproducible.
  varietySeed: 12345
  phases:
    - type: ring
      durationSecs: 8
    - type: message
      text: "Thank you for calling. All representatives are busy. Please hold."
    - type: holdMusic
      durationSecs: 30
  loopPhases: true
  humanPickupAfterSecs: 90

Top-level fields#

Field	Type	Required	Description
`direction`	`inbound` \| `outbound`	no	Which side of the call to simulate. Defaults to `inbound`, where the persona is a caller dialing the agent. Use `outbound` to test agents that place calls; the persona is the callee who answers.
`systemPrompt`	string	yes	Prompt driving the simulated user's behavior.
`inputData`	object	no	Task payload persisted to each spawned session and exposed to the handler as `context.input_data`. Most useful with `direction: outbound`, but allowed for inbound simulations too.
`numberOfSimulations`	integer (1-100)	no	Number of simulated sessions spawned per `vr simulate` invocation.
`provider`	`gemini_live` \| `openai_realtime`	no	Persona engine vendor. Defaults to `gemini_live`.
`model`	string	no	Provider-specific model id. For `gemini_live`: e.g. `gemini-3.1-flash-live-preview` (default). For `openai_realtime`: e.g. `gpt-realtime` (default), `gpt-realtime-mini`.
`voice`	string	no	Provider-specific voice id. For `gemini_live`: `Aoede`, `Puck`, `Charon`, `Kore`, `Fenrir`, etc. For `openai_realtime`: `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`, `marin`, `cedar`.
`language`	string	no	BCP-47 language tag for the persona, such as `en-US` or `es-MX`. Falls back to the simulator default when omitted.
`varietySeed`	integer (0-9007199254740991)	no	Seed for the simulator's delivery variation (greeting, phrasing, mood, and verbosity). Omit for fresh randomness; set it to reproduce a specific simulated caller.
`phases`	`PhaseSpec[]`	no	Pre-pickup phase script — see below.
`loopPhases`	boolean	no	When `phases` is non-empty, restart the phase list when it ends. Default `true`.
`humanPickupAfterSecs`	integer (0-600)	no	Seconds of phase playback before the persona takes over. Omit to never auto-pickup (tests the agent's give-up logic).

For direction: outbound, omit phases, loopPhases, and humanPickupAfterSecs. Outbound simulations model the callee answering immediately, so the API rejects pre-pickup phases for outbound manifests.

Outbound example#

apiVersion: voicerun/v1
kind: Simulation
metadata:
  name: appointment-reminder
spec:
  direction: outbound
  systemPrompt: |
    You are Dana Lee. You just answered your phone and are willing to talk briefly.
  inputData:
    customerName: Dana Lee
    appointmentTime: Tuesday at 3pm
    reason: confirm upcoming appointment
  numberOfSimulations: 3

`spec.phases`#

Phase entries play before the simulated persona starts speaking — useful for warm-transfer testing where the outbound leg waits through ringing/queue/IVR before someone "answers".

`type`	Fields	Description
`ring`	`durationSecs` (int, 1-600)	North-American ringback tone (440+480 Hz, 2s on / 4s off).
`holdMusic`	`durationSecs` (int, 1-600)	Looping arpeggio that reads as hold music to a VAD.
`message`	`text` (non-empty string)	Pre-rendered automated-IVR speech (Gemini TTS).
`ivrMenu`	`prompt`, `options`, optional `timeoutSecs` / `maxRepeats` / `onNoInput` / `onInvalid`	Recursive IVR menu node. See below.

`ivrMenu` phase

Field	Type	Description
`prompt`	string	The IVR prompt the simulator plays.
`options`	map<DTMF digit, PhaseSpec[]>	Single-character keys (`0`-`9`, `*`, `#`) mapped to the sub-phases that fire when the agent sends that digit. Phases can themselves be `ivrMenu` entries for multi-level trees.
`timeoutSecs`	integer (1-120)	Seconds to wait for a digit after the prompt finishes. Default `8`.
`maxRepeats`	integer (0-10)	Additional re-prompts when no digit arrives. Default `2`.
`onNoInput`	`PhaseSpec[]`	Phases to run after `maxRepeats` re-prompts produce no input.
`onInvalid`	`PhaseSpec[]`	Phases to run when the agent sends a digit not in `options`.

Webhook#

Session-end webhook delivery configuration. The destination URL must be http(s).

apiVersion: voicerun/v1
kind: Webhook
metadata:
  name: my-agent-webhook
spec:
  url: https://example.com/voicerun/session-webhook
  events:
    - session.ended
  signingToken: "{{ Secrets.organization.WEBHOOK_SIGNING_TOKEN }}"

Top-level fields#

Field	Type	Required	Description
`url`	string (http/https URL)	yes	Destination URL.
`events`	string[]	yes	Event triggers this webhook listens for. Only `session.ended` is supported today; the list is intentionally small so adding a new event requires explicit code review.
`signingToken`	string	no	HMAC-SHA256 signing token for outgoing deliveries. Typically supplied via `{{ Secrets.organization.NAME }}` and resolved at consume time. When absent, the worker sends an unsigned request.

Evaluator#

Scores or extracts data from a session after it completes. Results are surfaced through vr evaluation list and vr evaluation info.

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: resolution-judge
spec:
  title: Resolution Judge
  evalType: judge
  targetFormat: transcript
  systemPrompt: |
    Score the session 1-5 on whether the agent resolved the caller's
    request. Respond with a JSON object matching the response schema.
  responseSchema:
    type: object
    properties:
      score: { type: integer, minimum: 1, maximum: 5 }
      reasoning: { type: string }
    required: [score, reasoning]
  successCriteria:
    score: { ">=": 4 }
  apiProvider: google
  model: gemini-3.5-flash

Common fields#

Field	Type	Required	Description
`title`	string	yes	Human-readable title (shown in evaluation listings).
`evalType`	`judge` \| `extraction` \| `deterministic`	yes	Whether this evaluator scores a session against criteria (`judge`), extracts structured data (`extraction`), or asserts on the derived session view without an LLM (`deterministic`).
`targetFormat`	`events` \| `transcript`	no	What the evaluator sees. `events` passes the raw session-event stream; `transcript` passes a flattened user/agent transcript. (Deterministic always reads the session view; this field is ignored.)
`apiProvider`	string	no	LLM provider (e.g. `google`, `openai`, `anthropic`). Not applicable to `deterministic`.
`model`	string	no	Model id within the provider (e.g. `gemini-3.5-flash`). Not applicable to `deterministic`.
`precondition`	object	no	Optional JSON predicate evaluated against the derived session view. Sessions that don't satisfy it are recorded as `status="skipped"` with no LLM call. See Preconditions.

`judge` evaluators#

Field	Type	Required	Description
`systemPrompt`	string	yes	Instructions for the judge model.
`responseSchema`	object	yes	JSON schema describing the judge's structured response.
`successCriteria`	object	yes	Criteria evaluated against the judge's response — drives the `success` flag on the resulting `Evaluation`.

`extraction` evaluators#

Field	Type	Required	Description
`systemPrompt`	string	yes	Instructions describing what to extract.
`responseSchema`	object	no	Optional JSON schema constraining the extracted payload.

`deterministic` evaluators#

A deterministic evaluator asserts on the derived session view rather than calling an LLM — same input, same result, zero token cost. Use for purely factual checks: did a specific tool get called, did the caller say a specific word, was the duration in range.

Field	Type	Required	Description
`assertion`	object	yes	JSON predicate (same operator set as `successCriteria`) evaluated against the session view. Match → `success: true`; mismatch → `success: false` with a structured `details.failedPath` and `details.reason`.

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: inbound-cancellation-request
spec:
  title: Inbound caller mentioned cancellation
  evalType: deterministic
  assertion:
    direction: "inbound"
    events:
      $any:
        name: "transcript_part"
        data.role: "user"
        data.content: { $icontains: "cancel" }

Session view fields the assertion (or a precondition) can read:

Field	Description
`turn_count`	Number of completed turns (count of `turn_end` events). A mid-turn hangup doesn't count.
`duration_seconds`	Seconds between `startedAt` and `endedAt`. Returns `0` when the session hasn't ended.
`direction`	`inbound` or `outbound`
`origin`	Where the session came from (e.g. `phone`, `web`, `simulation`, `native`)
`tags`	Session tags as a string array. Matchable by bare primitive (`tags: "billing"`) or by `$inc` / `$ninc`.
`environment`	`[id, name]` for the session's environment — prefers the org-scoped `environmentId`, falls back to legacy `agentEnvironmentId`. Matchable by bare primitive against either value: `environment: "production"` or `environment: "<uuid>"` both work.
`events`	Raw event list in arrival order — each element `{ name, data, timestamp }`. Use with `$any` to assert on event names or payloads (transcript content, tool arguments, etc.).

Operators:

Operator	Meaning
`$eq` / `$ne`	Equals / not equals (deep equality)
`$gt` / `$gte` / `$lt` / `$lte`	Numeric comparison
`$in` / `$nin`	Value is / is not in a literal array
`$inc` / `$ninc`	Input array does / does not contain the target value
`$contains` / `$icontains`	String input contains the substring (case-sensitive / -insensitive). Returns false when either side isn't a string.
`$any`	Input array has at least one element matching the sub-predicate. Recurses; the sub-predicate is itself a predicate object.

Dotted field paths walk nested objects. Mixing operators and field names at the same level is rejected at evaluation time.

Bare primitive vs array input. When a predicate's value is a primitive and the input field is an array, the engine does membership matching (equivalent to $inc). This lets tags: "billing" work without $inc, and lets environment: "production" match against [id, name] regardless of whether you wrote the name or the ID. Scalar-vs-scalar equality is unchanged.

Event-payload checks via $any — ask questions like "did the caller use a phrase" or "did the agent reach a particular closing" without flattening the event stream up front:

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: refund-mentioned
spec:
  title: Caller mentioned refund
  evalType: deterministic
  assertion:
    events:
      $any:
        name: "transcript_part"
        data.role: "user"
        data.content: { $icontains: "refund" }

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: agent-closed-with-goodbye
spec:
  title: Agent ended with a goodbye
  evalType: deterministic
  assertion:
    events:
      $any:
        name: "transcript_part"
        data.role: "agent"
        data.content: { $icontains: "goodbye" }

Preconditions#

Any evaluator type can declare a precondition to gate whether it runs. If the predicate doesn't satisfy, the evaluator is skipped — the row is persisted with status="skipped" and a skipReason naming the failing field, no LLM call is made. This protects you from running expensive "did the agent handle the objection well?" evals against 1-turn hangups while keeping the skip auditable.

apiVersion: voicerun/v1
kind: Evaluator
metadata:
  name: objection-handling
spec:
  title: Objection Handling
  evalType: judge
  targetFormat: transcript
  systemPrompt: …
  responseSchema: { type: object, properties: { score: { type: integer } } }
  successCriteria: { score: { $gte: 4 } }
  precondition:
    turn_count: { $gte: 3 }
    duration_seconds: { $gte: 30 }

Filter skipped rows with vr evaluation list --status skipped or the web dashboard's status filter.

Values and Secrets#

Values files#

.voicerun/values.yaml is the base values file used by Helm. Per-environment overlays (e.g. prod.yaml, staging.yaml) live alongside it and are pulled in with --values prod.yaml on vr release, vr render, or vr simulate.

# .voicerun/values.yaml
variables:
  LOG_LEVEL: info
region: us-central1-a
stt:
  model: flux-general-en
  language: en
tts:
  provider: cartesia
  model: sonic-2
  voice: lyric
recording:
  enabled: false
tracing:
  enabled: true
webhook:
  url: null            # leave null to skip the Webhook resource entirely
simulation:
  numberOfSimulations: 1
evaluator:
  apiProvider: google
  model: gemini-3.5-flash

Secret placeholders#

Anywhere a string value lands in the rendered manifest, you can reference an organization secret as:

signingToken: "{{ Secrets.organization.WEBHOOK_SIGNING_TOKEN }}"

Helm leaves the placeholder intact through rendering. The API resolves it against organization secrets at session start, so secrets never round-trip through the release record itself. Create secrets with vr create secret.

Validation#

Both vr validate and vr render run shape-only validation against rendered manifests. The validator checks that:

Each document has a known kind (Deployment, Simulation, Webhook, Evaluator).
spec contains only the allowed top-level keys for that kind.
Required fields are present (e.g. Webhook.spec.url, Evaluator.spec.systemPrompt for judge, Evaluator.spec.assertion for deterministic).
precondition and assertion (when present) are plain objects.
Bounded fields are in range (e.g. numberOfSimulations is 1-100).
metadata.name is unique per kind within a manifest.

Validator-level checks don't hit the database, so they don't catch missing organization secrets or unknown providers — those are surfaced at vr release time when the API processes the manifest.

Declarative Resources

ivrMenu phase

`ivrMenu` phase