HTTP API

tokentrace exposes an HTTP API for ingesting spans, querying metrics, retrieving traces, and managing alert rules. Enable the server by setting HTTPServer in your tracer config or http_server in tokentrace.yml.

All endpoints return JSON. Error responses use a standard envelope:

{"error": "message describing what went wrong"}

Default base URL: http://localhost:9090.

POST /spans

Ingest one or more spans. This is the endpoint that HTTPTransport posts to. You can also call it directly to ingest spans from non-Go services or from tests.

Request body:

[
  {
    "name":          "summarize-document",
    "model":         "gpt-4o",
    "provider":      "openai",
    "prompt_tokens": 512,
    "comp_tokens":   128,
    "total_tokens":  640,
    "cost_usd":      0.00448,
    "latency_ms":    1240,
    "status":        "ok",
    "caller":        "summarizer-service",
    "attributes": {
      "eval.score":    0.87,
      "workflow":      "document-summary"
    }
  }
]

You may POST a single span (as a JSON object) or an array of spans.

Response:

HTTP 202 Accepted
{"accepted": 1}

Errors:

400 Bad Request — malformed JSON or missing required fields (model, at least one token count)
413 Request Entity Too Large — batch exceeds the configured max_batch_size (default: 1000 spans)

GET /metrics

Returns all built-in metrics for the specified time window.

Query parameters:

Parameter	Type	Default	Description
`window`	duration	`1h`	Time window: `1h`, `6h`, `24h`, `7d`, `30d`, or a Go duration string
`groupby`	string	—	Group a metric by an attribute key or built-in dimension (`model`, `caller`, `provider`)

Response:

{
  "window": "1h",
  "span_count": 312,
  "cost": {
    "total_usd": 1.42,
    "per_call_usd": 0.00456,
    "by_model": {"gpt-4o": 1.28, "gpt-4o-mini": 0.14}
  },
  "tokens": {
    "prompt_total": 156800,
    "completion_total": 39200,
    "total": 196000
  },
  "latency": {
    "p50_ms": 820,
    "p95_ms": 2140,
    "p99_ms": 3810
  },
  "errors": {
    "rate": 0.013,
    "count": 4
  },
  "quality": {
    "score_avg": 0.884,
    "score_p10": 0.71,
    "scored_span_count": 180
  }
}

GET /metrics/{name}

Returns a single metric by name.

Path parameters:

name — metric name from the Metrics Reference. Examples: cost, latency, quality, error_rate.

Query parameters: same as GET /metrics.

Example:

GET /metrics/cost?window=24h

{
  "window": "24h",
  "total_usd": 8.74,
  "per_call_usd": 0.00421,
  "by_model": {
    "gpt-4o":      7.91,
    "gpt-4o-mini": 0.83
  },
  "span_count": 2076
}

GET /metrics/latency?window=1h&groupby=model

{
  "window": "1h",
  "by_model": {
    "gpt-4o":      {"p50_ms": 980,  "p95_ms": 2340, "p99_ms": 4100},
    "gpt-4o-mini": {"p50_ms": 320,  "p95_ms": 710,  "p99_ms": 1200}
  }
}

GET /metrics/prometheus

Returns all metrics in Prometheus text exposition format. Suitable for scraping by a Prometheus server.

Response: Content-Type: text/plain; version=0.0.4

# HELP tokentrace_cost_total Total cost in USD.
# TYPE tokentrace_cost_total counter
tokentrace_cost_total{model="gpt-4o",provider="openai",caller="summarizer-service"} 7.91

# HELP tokentrace_latency_ms Inference latency in milliseconds.
# TYPE tokentrace_latency_ms histogram
tokentrace_latency_ms_bucket{le="100"} 12
tokentrace_latency_ms_bucket{le="500"} 148
tokentrace_latency_ms_bucket{le="1000"} 389
...

GET /traces/{id}

Retrieves all spans for a specific trace ID.

Path parameters:

id — trace ID returned by trace.ID() or present in any span’s trace_id field.

Response:

{
  "trace_id": "7f3a1c2b-e4d5-4f6a-8b9c-0d1e2f3a4b5c",
  "name": "summarize-document",
  "started_at": "2026-03-15T14:23:01.441Z",
  "ended_at": "2026-03-15T14:23:02.681Z",
  "total_cost_usd": 0.00448,
  "total_tokens": 640,
  "span_count": 1,
  "spans": [
    {
      "span_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "parent_span_id": null,
      "name": "summarize-document",
      "model": "gpt-4o",
      "provider": "openai",
      "prompt_tokens": 512,
      "comp_tokens": 128,
      "total_tokens": 640,
      "cost_usd": 0.00448,
      "latency_ms": 1240,
      "status": "ok",
      "started_at": "2026-03-15T14:23:01.441Z",
      "ended_at": "2026-03-15T14:23:02.681Z",
      "attributes": {
        "eval.score": 0.87
      }
    }
  ]
}

Errors:

404 Not Found — trace ID not found (not recorded, or outside the retention window)

GET /traces

Query spans by attribute value. Returns matching traces in reverse chronological order.

Query parameters:

Parameter	Type	Description
`attr.{key}`	string	Filter to spans with this attribute key/value pair
`model`	string	Filter to spans with this model
`caller`	string	Filter to spans with this caller
`status`	string	Filter by status: `ok`, `error`, `timeout`
`since`	RFC3339	Return traces started after this time
`until`	RFC3339	Return traces started before this time
`limit`	int	Maximum number of traces to return (default: 20, max: 200)
`offset`	int	Offset for pagination

Example:

GET /traces?attr.request_id=abc123
GET /traces?model=gpt-4o&status=error&limit=50

GET /alerts

Returns all configured alert rules and their current state.

Response:

[
  {
    "name":          "hourly-cost-spike",
    "metric":        "total_cost",
    "op":            "gt",
    "threshold":     10.00,
    "window":        "1h",
    "last_evaluated": "2026-03-15T15:00:00.000Z",
    "last_fired":    "2026-03-15T14:00:00.000Z",
    "current_value": 1.42,
    "firing":        false,
    "silenced":      false,
    "silenced_until": null
  }
]

POST /alerts

Create a new alert rule at runtime without restarting the process.

Request body:

{
  "name":      "model-latency-alert",
  "metric":    "latency_p95",
  "op":        "gt",
  "threshold": 4000,
  "window":    "30m",
  "cooldown":  "1h",
  "min_spans": 10,
  "filter":    {"model": "gpt-4o"},
  "delivery": {
    "type": "http",
    "url":  "https://hooks.example.com/alerts"
  }
}

Response:

HTTP 201 Created
{"rule_id": "alert_a1b2c3d4", "name": "model-latency-alert"}

POST /alerts/{name}/silence

Silence an alert rule for a duration.

Request body:

{"duration": "2h"}

Response:

HTTP 200 OK
{"silenced_until": "2026-03-15T17:00:00.000Z"}

DELETE /alerts/{name}

Remove an alert rule. Only applies to rules created via the API; rules defined in code or config are restored on restart.

Response: HTTP 204 No Content

GET /health

Health check endpoint. Returns 200 OK when the server is running and the metrics aggregator is operational.

Response:

{
  "status":       "ok",
  "version":      "0.4.0",
  "uptime_s":     3612,
  "span_count":   8941,
  "transport":    "file",
  "queue_depth":  0
}

Returns 503 Service Unavailable if the aggregator is unhealthy (e.g., disk full when using FileTransport).

Authentication

The HTTP server does not enforce authentication by default. To require a bearer token:

# tokentrace.yml
http_server:
  addr: ":9090"
  auth_token: "${TOKENTRACE_API_TOKEN}"

All requests must include Authorization: Bearer <token>. Requests without a valid token receive 401 Unauthorized.

Next steps

Configuration — Enable and configure the HTTP server.
Alerts — Alert rule structure and delivery options.
Metrics Reference — Every metric name available at /metrics/{name}.

← Previous Metrics Reference

Next → Configuration