Architecture · Technical

Model Routing and Failover: Designing Multi-Provider LLM Resilience

Amestris — Boutique AI & Technology Consultancy

Relying on a single model endpoint is a single point of failure. Providers have outages, regional incidents, quota limits, and model updates that change behavior. Multi-provider routing can improve resilience and cost control—but only if failover is deterministic, policy-driven, and observable.

The goal is not “switch providers when things break”. The goal is to define policy: which workloads can run where, with which models, at what cost and risk. Keep that policy outside application code, so it evolves without redeploying every product.

Build a routing control plane

Start with a central router (or gateway) that evaluates each request against constraints: data residency, safety requirements, tool availability, context length, latency SLOs, and cost ceilings. Routing should be explainable: record the evaluated rules and the chosen provider/model version for every call.

Useful routing signals include: intent classification (search vs drafting vs action), risk tier (public vs sensitive), user/tenant plan, region, and whether tools are enabled. Encode these as explicit fields, not implicit prompt text, so they can be logged and audited.

Routing must also respect compliance boundaries. If some tenants require in-region processing, make region a hard constraint; if data must not leave a zone, use a privacy gateway that redacts or tokenises sensitive fields before requests cross boundaries.

Use guardrails so providers are interchangeable only where they should be. Some models support function calling well; others excel at summarisation. Routing policy should encode capabilities rather than assuming “LLM is an LLM”.

Design failover paths, not just backups

Failover needs safe fallbacks. For high-risk flows, a “fail closed” refusal may be correct. For operational flows, route to a smaller model, reduce context, or disable tools and offer a human escalation. Silent degradation is the worst outcome: users keep trusting results while quality collapses.

  • Circuit breakers. Detect provider error spikes and open a circuit to stop wasting retries. Combine with health checks that measure latency and quality, not just HTTP status.
  • Deterministic retries. Retry with bounded backoff and idempotency keys so tool actions don’t duplicate. Make “at-most-once” behavior explicit for irreversible operations.
  • Provider-specific safety. Align content filters, logging, and data retention settings across providers so a failover does not change your compliance posture.
  • Prompt portability. Version prompts per provider when needed and continuously evaluate that responses remain equivalent on critical intents.

Make routing quality-aware

Resilience is not only uptime. A route that is “available” but produces lower-quality or less-safe answers is still an incident. Maintain a small benchmark of critical intents and run it continuously across routes. If a route’s scores drop (groundedness, refusal correctness, tool-call validity), automatically reduce traffic or disable it until fixed.

To avoid route thrashing, introduce stickiness: keep a session on the same model unless health breaks. This stabilises tone and reduces portability issues. Also consider fairness controls so one tenant’s surge does not starve others.

This is easiest when routing and evaluation share the same identifiers: provider, model version, prompt version, tools enabled. With that, you can answer “what changed?” quickly and avoid long outages caused by invisible quality regressions.

Prove it with chaos and observability

Chaos-test routing. Simulate regional outages, quota exhaustion and “slow but up” provider behavior. Run shadow deployments and canary rollouts to catch regressions in quality, safety and cost before customers do.

If you are selecting providers now, start with Choosing and Combining LLM Providers, then operationalise with routing that has clear runbooks: when to fail over, when to throttle, when to disable tools, and when to escalate to humans.

Multi-provider resilience is a product choice. Done well, routing reduces blast radius and keeps customers whole when the inevitable incident arrives.

Quick answers

What does this article cover?

How to route traffic across multiple models/providers using policy, health checks and safe fallbacks.

Who is this for?

Architecture and engineering leaders designing resilient LLM platforms across multiple providers and models.

If this topic is relevant to an initiative you are considering, Amestris can provide independent advice or architecture support. Contact hello@amestris.com.au.