Go
stable
infermux
Route inference. Survive failure.
go get github.com/greynewell/infermux
Provider-agnostic routing
OpenAI, Anthropic, Ollama, Azure OpenAI, and any OpenAI-compatible endpoint. Add providers in config, not code.
Automatic failover
Circuit breakers open on provider failure. Requests route to healthy providers automatically. No intervention required.
Cost tracking
Per-request token accounting. Attribute spend by model, provider, and caller. Set budgets and get alerted.
OpenAI-compatible API
Drop-in replacement for the OpenAI API. Point any OpenAI client at infermux and it works.
Load balancing
Round-robin, least-latency, and cost-weighted routing strategies. Configurable per route.
Zero dependencies
Built on mist-go. Ships as a single binary with no runtime requirements.
Example
infermux
$ infermux serve --config infermux.yml
providers: openai (healthy), anthropic (healthy), ollama (healthy)
listening on :8080
# In another terminal:
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hello"}]}'
# Routes to cheapest available provider
# Falls back automatically if primary is down
Provider outages don't have to be incidents. infermux sits in front of your model calls and handles failover automatically — when a provider trips its circuit breaker, the router stops sending traffic until a recovery probe succeeds.
It exposes an OpenAI-compatible HTTP API. Any client that talks to OpenAI works with infermux without code changes. Route by cost, latency, or priority. Set per-caller budgets. Get full token cost attribution without custom instrumentation.
Cost events flow into [tokentrace](/tokentrace/) automatically. If you run the full MIST stack, infermux becomes the inference layer that feeds cost and latency data into your eval pipeline.