Error Fallback (Failover)

Error Fallback is a per-environment safety net for live phone calls. When an agent stops responding or the call hits an unrecoverable error, VoiceRun can automatically hand the caller off — either by transferring the call to a phone number, or by calling a webhook that decides what happens next. Without it, a failing agent leaves the caller in silence until the call drops.

Error Fallback is configured per agent environment and applies to phone calls. It is distinct from completions retries & fallbacks, which recover from individual LLM provider failures inside your handler. The two are complementary: completions fallbacks keep a single turn alive; Error Fallback rescues the whole call when the agent as a whole stops responding.


When it triggers#

A fallback fires on either of two conditions:

  1. No-response timeout (circuit breaker). After the caller speaks, the runtime expects the agent to start responding within Error Fallback Timeout. If it doesn't, that counts as a timeout occurrence. Occurrences are tracked across sessions for the same agent + environment in a sliding time window. Once the number of occurrences reaches the occurrence threshold within the time window, the fallback is triggered. This behaves as a circuit breaker: it rides out one-off hiccups but fails over once a systemic problem (e.g. an outage upstream) is affecting multiple calls.

  2. Call / stream errors. When the telephony stream or call encounters a hard error, the fallback action runs immediately for the affected call.

Occurrences are de-duplicated per session (one slow call counts once) and are cleared as soon as the agent responds successfully, so transient slowness on a single call does not accumulate toward the threshold.


Configuring in the dashboard#

  1. Go to Agents → select your agent → Environments.
  2. On the environment, open Failover to bring up the Error Fallback Settings dialog. (It's also reachable from the command palette by searching "Error Fallback Settings".)
  3. Choose a type, fill in the value, tune the timing fields, and save.

Saving updates the environment configuration; it does not require redeploying your function.


Settings reference#

SettingValuesDefaultNotes
Error Fallback TypeDisabled, Phone, WebhookDisabledSelects the fallback action. Disabled means no fallback — the session is failed and the call ends.
Error Fallback Valuephone number or URLThe destination. A phone number (E.164, e.g. +15551234567) for Phone, or an HTTPS URL for Webhook. Editable only once a type is selected.
Error Fallback Timeout030 seconds0Seconds to wait for the agent to respond after caller input before counting a timeout occurrence. 0 disables the no-response timeout trigger.
Error Fallback Occurrence Threshold1205Number of timeout occurrences (across distinct sessions, within the window) required before the fallback triggers.
Error Fallback Time Window603600 seconds300Sliding window over which occurrences are counted. Default is 5 minutes.

The timeout, threshold, and window fields are stored in seconds.


Fallback actions#

Phone#

The active call is transferred to the configured phone number via your telephony provider (Twilio or Telnyx). Use this to route callers to a human, a voicemail line, or another agent when the primary agent is unavailable.

Error Fallback Type:  Phone
Error Fallback Value: +15551234567

Webhook#

VoiceRun sends an HTTP POST to the configured URL with the call details as a JSON body, and the error message appended as an error query parameter:

POST https://your-app.example.com/voicerun/error?error=<url-encoded-error-message>
Content-Type: application/json

{
  "CallSid": "CA...",
  "ErrorMessage": "...",
  ...telephony call parameters...
}

Your endpoint controls what happens next by what it returns:

  • Return TwiML (an XML <Response>…</Response> document) to take over the call — for example <Say>, <Dial>, or <Redirect>. VoiceRun applies it to the live call.
  • Return anything else and the response is ignored (logged as unexpected); the call proceeds to terminate.
  • A non-2xx response is treated as a webhook failure.

This lets you make a routing decision dynamically — look up the caller, pick a transfer target, play a custom message, or queue a callback.

Signature verification

When the fallback type is set to Webhook, VoiceRun auto-generates a per-environment signing secret (prefixed whsec_) and signs every error fallback request with HMAC-SHA256 — using the same scheme as the main webhooks. The secret is returned on the environment in the errorFallbackWebhookSecret field.

Each request includes two headers:

HeaderDescription
X-VoiceRun-Signaturesha256=<hex>, computed as HMAC-SHA256(secret, "{timestamp}.{raw_body}")
X-VoiceRun-TimestampUnix timestamp (seconds) when the request was sent

Verify the signature before acting on the payload — see the webhook verification examples for Python and JavaScript implementations (the algorithm is identical).

Backwards compatible. Environments that pre-date this feature, or that have had their secret cleared, send unsigned requests (no signature headers). Once a secret exists on the environment, every request is signed.

Regenerating the secret

If the secret is compromised, rotate it via the API:

curl -X POST \ https://api.voicerun.com/v1/agents/{agentId}/environments/{environmentId}/regenerate-error-fallback-webhook-secret \ -H "Authorization: Bearer YOUR_API_KEY"

The response returns the new errorFallbackWebhookSecret. Update your verifying endpoint with the new value — requests signed with the old secret will fail verification immediately.

Cloning an environment that has an error fallback webhook secret generates a new secret for the clone; the source secret is never copied.

Disabled#

No fallback action is taken. The session is marked as failed and the call ends. This is the default.


How occurrence tracking works#

The no-response circuit breaker is evaluated by the agents runtime:

  • After caller input, a timer waits Error Fallback Timeout seconds for the agent to begin responding.
  • If the timer expires, a timeout occurrence is recorded against the agent + environment, keyed by session so each call contributes at most once.
  • Occurrences older than the time window are discarded; the remaining count is compared to the threshold.
  • If the count is below the threshold, the call keeps waiting and may record further occurrences.
  • When the count reaches the threshold, the fallback action runs and the session is failed.
  • When the agent responds successfully, recorded occurrences for that agent + environment are cleared.

Because occurrences are shared across concurrent sessions of the same environment, the breaker detects environment-wide problems quickly while tolerating isolated slow turns.


Recommendations#

  • Keep Timeout short enough that callers aren't left waiting (e.g. 8–15 seconds), but long enough to allow for normal LLM/TTS latency.
  • Start with the defaults (threshold 5, window 300s) and tighten only if you want to fail over faster during outages.
  • Prefer Webhook when you need dynamic routing decisions; prefer Phone for a simple, fixed handoff to a human or backup line.
  • Pair Error Fallback with handler-level resilience: use completions retries & fallbacks so individual provider blips never reach the call-level breaker, and surface failures with ErrorEvent for observability.

Next steps#

error-handlingfallbackfailovertelephonywebhooksreliability