Set up automatic failover between OpenAI and Anthropic
In this tutorial you’ll configure infermux with OpenAI as the primary provider and Anthropic as an automatic fallback. You’ll tune the circuit breaker so it responds appropriately to failures, simulate a provider outage, and verify that requests shift to Anthropic automatically. You’ll also learn how to monitor circuit state and verify recovery when the primary comes back.
What you need:
- infermux installed (
go install github.com/greynewell/infermux/cmd/infermux@latest) - An OpenAI API key
- An Anthropic API key
curlandjq
Understand priority routing
infermux’s priority routing strategy always tries providers in declaration order. The first healthy provider with a closed circuit that serves the requested model wins. If OpenAI’s circuit is open, infermux skips it and selects Anthropic. When OpenAI recovers, it resumes receiving traffic immediately.
This is the foundation of failover: you declare providers in priority order, configure circuit breaker thresholds appropriate for your SLA, and infermux handles the rest.
Write the config
Set your API keys:
export OPENAI_API_KEY=sk-proj-...
export ANTHROPIC_API_KEY=sk-ant-...
Create infermux.yml:
listen: ":8080"
management_listen: ":8081"
providers:
- name: openai
type: openai
api_key: "${OPENAI_API_KEY}"
models:
- gpt-4o
- gpt-4o-mini
timeout: 30s
circuit_breaker:
error_rate_threshold: 0.5 # open when 50% of requests fail
consecutive_failures: 5 # or after 5 consecutive failures
window_seconds: 30 # measured over a 30-second window
min_requests: 5 # require at least 5 requests before evaluating rate
recovery_window: 20s # stay open 20 seconds, then probe
probe_timeout: 5s
- name: anthropic
type: anthropic
api_key: "${ANTHROPIC_API_KEY}"
model_aliases:
gpt-4o: claude-opus-4-5
gpt-4o-mini: claude-haiku-3-5
timeout: 45s
circuit_breaker:
error_rate_threshold: 0.5
consecutive_failures: 5
window_seconds: 30
min_requests: 5
recovery_window: 30s
probe_timeout: 5s
routing:
strategy: priority
log:
level: info
format: json
The management_listen setting puts management endpoints on a separate port so they don’t interfere with inference traffic (and can be network-restricted separately in production).
Start infermux and verify both providers are healthy
infermux serve --config infermux.yml
You should see both providers healthy:
infermux v0.4.0
providers: openai (healthy), anthropic (healthy)
listening on :8080 (management: :8081)
In a second terminal, check the provider status via the management API:
curl -s http://localhost:8081/_infermux/providers | jq '.[] | {name, healthy, circuit}'
{"name": "openai", "healthy": true, "circuit": "closed"}
{"name": "anthropic", "healthy": true, "circuit": "closed"}
Verify that requests go to OpenAI
curl -s -D - http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "hi"}]}' \
| grep X-Infermux-Provider
X-Infermux-Provider: openai
Send several requests. They should all go to OpenAI because it is the priority-1 provider and its circuit is closed.
Simulate an OpenAI outage
To simulate OpenAI going down, manually open its circuit via the management API:
curl -s -X POST http://localhost:8081/_infermux/providers/openai/circuit/open
Verify the circuit is open:
curl -s http://localhost:8081/_infermux/providers | jq '.[] | {name, circuit}'
{"name": "openai", "circuit": "open"}
{"name": "anthropic", "circuit": "closed"}
Now send a request:
curl -s -D - http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "hi"}]}' \
| grep -E "X-Infermux-Provider|X-Infermux-Model"
X-Infermux-Provider: anthropic
X-Infermux-Model: claude-haiku-3-5
Requests are now routing to Anthropic, which is translating gpt-4o-mini to claude-haiku-3-5 via the model alias. The response body is still in OpenAI format — infermux normalizes it.
Understand real circuit-breaking (not just manual control)
Manual circuit control is useful for testing and maintenance, but in production the circuit opens automatically. To see this happen, you need to generate real errors against OpenAI.
The easiest way to test this without impacting your account is to temporarily misconfigure the OpenAI API key. Stop infermux, change the config to use a bad key for OpenAI, then restart:
providers:
- name: openai
type: openai
api_key: "sk-invalid-key-for-testing" # will cause 401 errors
models: [gpt-4o-mini]
circuit_breaker:
consecutive_failures: 3 # open faster for testing
min_requests: 3
Start infermux and send requests. After 3 consecutive 401s from OpenAI, the circuit opens automatically and requests shift to Anthropic.
Watch the JSON logs:
{"level":"warn","provider":"openai","error":"401 Unauthorized","consecutive_failures":1}
{"level":"warn","provider":"openai","error":"401 Unauthorized","consecutive_failures":2}
{"level":"warn","provider":"openai","error":"401 Unauthorized","consecutive_failures":3}
{"level":"warn","provider":"openai","circuit":"open","reason":"consecutive_failures","msg":"circuit opened"}
{"level":"info","provider":"anthropic","model":"claude-haiku-3-5","latency_ms":312,"msg":"request routed"}
After testing, restore the correct API key.
Verify automatic recovery
With the circuit open manually (from Step 5), wait for the recovery_window to expire (20 seconds in your config) and watch the circuit enter half-open, then closed:
# Poll circuit state every 2 seconds
while true; do
curl -s http://localhost:8081/_infermux/providers \
| jq -r '.[] | select(.name=="openai") | "\(.name): \(.circuit)"'
sleep 2
done
You’ll see the state progress:
openai: open
openai: open
openai: open
openai: open
openai: open
openai: open
openai: open
openai: open
openai: open
openai: half-open ← recovery window expired, probe sent
openai: closed ← probe succeeded, back to normal
Once the circuit closes, the next request will go back to OpenAI:
curl -s -D - http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "hi"}]}' \
| grep X-Infermux-Provider
# X-Infermux-Provider: openai
Check what happens when both providers are down
Open both circuits:
curl -s -X POST http://localhost:8081/_infermux/providers/openai/circuit/open
curl -s -X POST http://localhost:8081/_infermux/providers/anthropic/circuit/open
Send a request:
curl -s http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "hi"}]}'
HTTP/1.1 503 Service Unavailable
{
"error": {
"message": "no healthy providers available for model gpt-4o-mini",
"type": "infermux_error",
"code": "no_healthy_providers"
}
}
infermux returns a 503. Your application should treat this as a transient failure and retry with backoff, or return a graceful error to the user.
Reset both circuits:
curl -s -X POST http://localhost:8081/_infermux/providers/openai/circuit/reset
curl -s -X POST http://localhost:8081/_infermux/providers/anthropic/circuit/reset
Choosing circuit breaker thresholds: The right thresholds depend on your traffic volume and SLA. A low-traffic service might need min_requests: 5 so one bad request doesn’t trigger the circuit. A high-traffic service can use lower thresholds and a shorter window_seconds to detect degradation faster. The recovery_window should be long enough for the provider to recover but short enough that you don’t miss a full recovery. Start with the defaults and tune based on observed error patterns.
What you built
A two-provider infermux setup with automatic failover: OpenAI as primary, Anthropic as fallback. The circuit breaker opens automatically when OpenAI fails, and closes automatically when it recovers. No application code was changed.
What’s next
- Circuit Breaking — full reference for circuit breaker configuration and the state machine
- Cost Optimization tutorial — add cost-weighted routing to minimize spend across your provider set
- HTTP API — full reference for the management API