Go stable

infermux

Route inference. Survive failure.

go get github.com/greynewell/infermux

Provider-agnostic routing

OpenAI, Anthropic, Ollama, Azure OpenAI, and any OpenAI-compatible endpoint. Add providers in config, not code.

Automatic failover

Circuit breakers open on provider failure. Requests route to healthy providers automatically. No intervention required.

Cost tracking

Per-request token accounting. Attribute spend by model, provider, and caller. Set budgets and get alerted.

OpenAI-compatible API

Drop-in replacement for the OpenAI API. Point any OpenAI client at infermux and it works.

Load balancing

Round-robin, least-latency, and cost-weighted routing strategies. Configurable per route.

Zero dependencies

Built on mist-go. Ships as a single binary with no runtime requirements.

Example

infermux
$ infermux serve --config infermux.yml providers: openai (healthy), anthropic (healthy), ollama (healthy) listening on :8080 # In another terminal: $ curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hello"}]}' # Routes to cheapest available provider # Falls back automatically if primary is down
Provider outages don't have to be incidents. infermux sits in front of your model calls and handles failover automatically — when a provider trips its circuit breaker, the router stops sending traffic until a recovery probe succeeds. It exposes an OpenAI-compatible HTTP API. Any client that talks to OpenAI works with infermux without code changes. Route by cost, latency, or priority. Set per-caller budgets. Get full token cost attribution without custom instrumentation. Cost events flow into [tokentrace](/tokentrace/) automatically. If you run the full MIST stack, infermux becomes the inference layer that feeds cost and latency data into your eval pipeline.