Overview Docs Tutorials

Route your first inference request

beginner 10 min

In this tutorial you will install infermux, write a config file with a single OpenAI provider, start the server, and send a request through the router. By the end you’ll understand how infermux works as a transparent proxy and what information it adds to each response.

What you need:


Step 1

Install infermux

Install the CLI with go install:

go install github.com/greynewell/infermux/cmd/infermux@latest

This compiles infermux and places the binary at $(go env GOPATH)/bin/infermux. Add it to your path if it isn’t already:

export PATH="$PATH:$(go env GOPATH)/bin"
infermux --version
# infermux v0.4.0
Step 2

Set your API key

export OPENAI_API_KEY=sk-proj-...

infermux reads environment variables at startup. It never writes them to disk or logs them.

Step 3

Write a config file

Create a file called infermux.yml in your working directory:

listen: ":8080"

providers:
  - name: openai
    type: openai
    api_key: "${OPENAI_API_KEY}"
    models:
      - gpt-4o-mini
      - gpt-4o

routing:
  strategy: round_robin

This config declares one provider (OpenAI), lists the models it can serve, and sets the routing strategy to round-robin. With one provider, round-robin always routes to that provider — it’s effectively a transparent proxy at this point.

Step 4

Start the server

infermux serve --config infermux.yml

You should see:

infermux v0.4.0
config: infermux.yml
providers: openai (healthy)
listening on :8080

The (healthy) status means infermux successfully ran a startup health check against the OpenAI API. If you see (unhealthy), check your OPENAI_API_KEY.

Leave this terminal running. Open a new terminal for the next step.

Step 5

Send your first request

infermux exposes the OpenAI-compatible /v1/chat/completions endpoint. Send a request the same way you’d send it to OpenAI directly — just change the host:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

You should receive a standard OpenAI response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1741046400,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 2,
    "total_tokens": 17
  }
}
Step 6

Inspect the routing headers

Add -D - to curl to see the response headers:

curl -D - http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "hello"}]}' \
  2>/dev/null | head -20

Look for the infermux-specific headers:

HTTP/1.1 200 OK
Content-Type: application/json
X-Infermux-Provider: openai
X-Infermux-Model: gpt-4o-mini
X-Infermux-Strategy: round_robin
X-Infermux-Latency-Ms: 387
X-Infermux-Cost: 0.0000018
X-Infermux-Prompt-Tokens: 15
X-Infermux-Completion-Tokens: 2

These headers tell you:

  • Which provider served the request (openai)
  • The routing strategy used (round_robin)
  • How long the provider took to respond (387ms)
  • The estimated cost in USD ($0.0000018)
Step 7

Use infermux from Go

If your application uses the Go OpenAI client, change the base URL:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/openai/openai-go"
    "github.com/openai/openai-go/option"
)

func main() {
    // Point the client at infermux instead of api.openai.com
    client := openai.NewClient(
        option.WithBaseURL("http://localhost:8080/v1"),
        option.WithAPIKey("any-value"), // infermux uses its own credentials
    )

    chat, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
        Model: openai.F(openai.ChatModelGPT4oMini),
        Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
            openai.UserMessage("What is the capital of France?"),
        }),
    })
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(chat.Choices[0].Message.Content)
}

This is the full drop-in replacement: one line changes (WithBaseURL) and your application gains routing, failover, and cost tracking.

What you built: A running infermux instance routing requests to OpenAI. The response is identical to calling OpenAI directly — infermux adds routing metadata in headers but does not modify the response body.

What’s next