Error Fallback (Failover)
Error Fallback is a per-environment safety net for live phone calls. When an agent stops responding or the call hits an unrecoverable error, VoiceRun can automatically hand the caller off — either by transferring the call to a phone number, or by calling a webhook that decides what happens next. Without it, a failing agent leaves the caller in silence until the call drops.
Error Fallback is configured per agent environment and applies to phone calls. It is distinct from completions retries & fallbacks, which recover from individual LLM provider failures inside your handler. The two are complementary: completions fallbacks keep a single turn alive; Error Fallback rescues the whole call when the agent as a whole stops responding.
When it triggers#
A fallback fires on either of two conditions:
-
No-response timeout (circuit breaker). After the caller speaks, the runtime expects the agent to start responding within
Error Fallback Timeout. If it doesn't, that counts as a timeout occurrence. Occurrences are tracked across sessions for the same agent + environment in a sliding time window. Once the number of occurrences reaches the occurrence threshold within the time window, the fallback is triggered. This behaves as a circuit breaker: it rides out one-off hiccups but fails over once a systemic problem (e.g. an outage upstream) is affecting multiple calls. -
Call / stream errors. When the telephony stream or call encounters a hard error, the fallback action runs immediately for the affected call.
Occurrences are de-duplicated per session (one slow call counts once) and are cleared as soon as the agent responds successfully, so transient slowness on a single call does not accumulate toward the threshold.
Configuring in the dashboard#
- Go to Agents → select your agent → Environments.
- On the environment, open Failover to bring up the Error Fallback Settings dialog. (It's also reachable from the command palette by searching "Error Fallback Settings".)
- Choose a type, fill in the value, tune the timing fields, and save.
Saving updates the environment configuration; it does not require redeploying your function.
Settings reference#
| Setting | Values | Default | Notes |
|---|---|---|---|
| Error Fallback Type | Disabled, Phone, Webhook | Disabled | Selects the fallback action. Disabled means no fallback — the session is failed and the call ends. |
| Error Fallback Value | phone number or URL | — | The destination. A phone number (E.164, e.g. +15551234567) for Phone, or an HTTPS URL for Webhook. Editable only once a type is selected. |
| Error Fallback Timeout | 0–30 seconds | 0 | Seconds to wait for the agent to respond after caller input before counting a timeout occurrence. 0 disables the no-response timeout trigger. |
| Error Fallback Occurrence Threshold | 1–20 | 5 | Number of timeout occurrences (across distinct sessions, within the window) required before the fallback triggers. |
| Error Fallback Time Window | 60–3600 seconds | 300 | Sliding window over which occurrences are counted. Default is 5 minutes. |
The timeout, threshold, and window fields are stored in seconds.
Fallback actions#
Phone#
The active call is transferred to the configured phone number via your telephony provider (Twilio or Telnyx). Use this to route callers to a human, a voicemail line, or another agent when the primary agent is unavailable.
Error Fallback Type: Phone
Error Fallback Value: +15551234567
Webhook#
VoiceRun sends an HTTP POST to the configured URL with the call details as a JSON body, and the
error message appended as an error query parameter:
POST https://your-app.example.com/voicerun/error?error=<url-encoded-error-message>
Content-Type: application/json
{
"CallSid": "CA...",
"ErrorMessage": "...",
...telephony call parameters...
}
Your endpoint controls what happens next by what it returns:
- Return TwiML (an XML
<Response>…</Response>document) to take over the call — for example<Say>,<Dial>, or<Redirect>. VoiceRun applies it to the live call. - Return anything else and the response is ignored (logged as unexpected); the call proceeds to terminate.
- A non-2xx response is treated as a webhook failure.
This lets you make a routing decision dynamically — look up the caller, pick a transfer target, play a custom message, or queue a callback.
Signature verification
When the fallback type is set to Webhook, VoiceRun auto-generates a per-environment signing
secret (prefixed whsec_) and signs every error fallback request with HMAC-SHA256 — using the
same scheme as the main webhooks. The secret is returned on the
environment in the errorFallbackWebhookSecret field.
Each request includes two headers:
| Header | Description |
|---|---|
X-VoiceRun-Signature | sha256=<hex>, computed as HMAC-SHA256(secret, "{timestamp}.{raw_body}") |
X-VoiceRun-Timestamp | Unix timestamp (seconds) when the request was sent |
Verify the signature before acting on the payload — see the webhook verification examples for Python and JavaScript implementations (the algorithm is identical).
Backwards compatible. Environments that pre-date this feature, or that have had their secret cleared, send unsigned requests (no signature headers). Once a secret exists on the environment, every request is signed.
Regenerating the secret
If the secret is compromised, rotate it via the API:
curl -X POST \ https://api.voicerun.com/v1/agents/{agentId}/environments/{environmentId}/regenerate-error-fallback-webhook-secret \ -H "Authorization: Bearer YOUR_API_KEY"
The response returns the new errorFallbackWebhookSecret. Update your verifying endpoint with
the new value — requests signed with the old secret will fail verification immediately.
Cloning an environment that has an error fallback webhook secret generates a new secret for the clone; the source secret is never copied.
Disabled#
No fallback action is taken. The session is marked as failed and the call ends. This is the default.
How occurrence tracking works#
The no-response circuit breaker is evaluated by the agents runtime:
- After caller input, a timer waits
Error Fallback Timeoutseconds for the agent to begin responding. - If the timer expires, a timeout occurrence is recorded against the agent + environment, keyed by session so each call contributes at most once.
- Occurrences older than the time window are discarded; the remaining count is compared to the threshold.
- If the count is below the threshold, the call keeps waiting and may record further occurrences.
- When the count reaches the threshold, the fallback action runs and the session is failed.
- When the agent responds successfully, recorded occurrences for that agent + environment are cleared.
Because occurrences are shared across concurrent sessions of the same environment, the breaker detects environment-wide problems quickly while tolerating isolated slow turns.
Recommendations#
- Keep Timeout short enough that callers aren't left waiting (e.g. 8–15 seconds), but long enough to allow for normal LLM/TTS latency.
- Start with the defaults (threshold 5, window 300s) and tighten only if you want to fail over faster during outages.
- Prefer Webhook when you need dynamic routing decisions; prefer Phone for a simple, fixed handoff to a human or backup line.
- Pair Error Fallback with handler-level resilience: use
completions retries & fallbacks so individual provider
blips never reach the call-level breaker, and surface failures with
ErrorEventfor observability.
Next steps#
- Bring Your Own Telephony — connect Twilio/Telnyx for phone transfers
- Webhooks — webhook integration patterns
- Reliability — Retries & Fallbacks — handler-level LLM resilience
- Event Reference —
ErrorEventand other output events
