tokentrace exposes an HTTP API for ingesting spans, querying metrics, retrieving traces, and managing alert rules. Enable the server by setting HTTPServer in your tracer config or http_server in tokentrace.yml.
All endpoints return JSON. Error responses use a standard envelope:
{"error": "message describing what went wrong"}
Default base URL: http://localhost:9090.
Ingest one or more spans. This is the endpoint that HTTPTransport posts to. You can also call it directly to ingest spans from non-Go services or from tests.
Request body:
[
{
"name": "summarize-document",
"model": "gpt-4o",
"provider": "openai",
"prompt_tokens": 512,
"comp_tokens": 128,
"total_tokens": 640,
"cost_usd": 0.00448,
"latency_ms": 1240,
"status": "ok",
"caller": "summarizer-service",
"attributes": {
"eval.score": 0.87,
"workflow": "document-summary"
}
}
]
You may POST a single span (as a JSON object) or an array of spans.
Response:
HTTP 202 Accepted
{"accepted": 1}
Errors:
400 Bad Request — malformed JSON or missing required fields (model, at least one token count)413 Request Entity Too Large — batch exceeds the configured max_batch_size (default: 1000 spans)Returns all built-in metrics for the specified time window.
Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
window |
duration | 1h |
Time window: 1h, 6h, 24h, 7d, 30d, or a Go duration string |
groupby |
string | — | Group a metric by an attribute key or built-in dimension (model, caller, provider) |
Response:
{
"window": "1h",
"span_count": 312,
"cost": {
"total_usd": 1.42,
"per_call_usd": 0.00456,
"by_model": {"gpt-4o": 1.28, "gpt-4o-mini": 0.14}
},
"tokens": {
"prompt_total": 156800,
"completion_total": 39200,
"total": 196000
},
"latency": {
"p50_ms": 820,
"p95_ms": 2140,
"p99_ms": 3810
},
"errors": {
"rate": 0.013,
"count": 4
},
"quality": {
"score_avg": 0.884,
"score_p10": 0.71,
"scored_span_count": 180
}
}
Returns a single metric by name.
Path parameters:
name — metric name from the Metrics Reference. Examples: cost, latency, quality, error_rate.Query parameters: same as GET /metrics.
Example:
GET /metrics/cost?window=24h
{
"window": "24h",
"total_usd": 8.74,
"per_call_usd": 0.00421,
"by_model": {
"gpt-4o": 7.91,
"gpt-4o-mini": 0.83
},
"span_count": 2076
}
GET /metrics/latency?window=1h&groupby=model
{
"window": "1h",
"by_model": {
"gpt-4o": {"p50_ms": 980, "p95_ms": 2340, "p99_ms": 4100},
"gpt-4o-mini": {"p50_ms": 320, "p95_ms": 710, "p99_ms": 1200}
}
}
Returns all metrics in Prometheus text exposition format. Suitable for scraping by a Prometheus server.
Response: Content-Type: text/plain; version=0.0.4
# HELP tokentrace_cost_total Total cost in USD.
# TYPE tokentrace_cost_total counter
tokentrace_cost_total{model="gpt-4o",provider="openai",caller="summarizer-service"} 7.91
# HELP tokentrace_latency_ms Inference latency in milliseconds.
# TYPE tokentrace_latency_ms histogram
tokentrace_latency_ms_bucket{le="100"} 12
tokentrace_latency_ms_bucket{le="500"} 148
tokentrace_latency_ms_bucket{le="1000"} 389
...
Retrieves all spans for a specific trace ID.
Path parameters:
id — trace ID returned by trace.ID() or present in any span’s trace_id field.Response:
{
"trace_id": "7f3a1c2b-e4d5-4f6a-8b9c-0d1e2f3a4b5c",
"name": "summarize-document",
"started_at": "2026-03-15T14:23:01.441Z",
"ended_at": "2026-03-15T14:23:02.681Z",
"total_cost_usd": 0.00448,
"total_tokens": 640,
"span_count": 1,
"spans": [
{
"span_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"parent_span_id": null,
"name": "summarize-document",
"model": "gpt-4o",
"provider": "openai",
"prompt_tokens": 512,
"comp_tokens": 128,
"total_tokens": 640,
"cost_usd": 0.00448,
"latency_ms": 1240,
"status": "ok",
"started_at": "2026-03-15T14:23:01.441Z",
"ended_at": "2026-03-15T14:23:02.681Z",
"attributes": {
"eval.score": 0.87
}
}
]
}
Errors:
404 Not Found — trace ID not found (not recorded, or outside the retention window)Query spans by attribute value. Returns matching traces in reverse chronological order.
Query parameters:
| Parameter | Type | Description |
|---|---|---|
attr.{key} |
string | Filter to spans with this attribute key/value pair |
model |
string | Filter to spans with this model |
caller |
string | Filter to spans with this caller |
status |
string | Filter by status: ok, error, timeout |
since |
RFC3339 | Return traces started after this time |
until |
RFC3339 | Return traces started before this time |
limit |
int | Maximum number of traces to return (default: 20, max: 200) |
offset |
int | Offset for pagination |
Example:
GET /traces?attr.request_id=abc123
GET /traces?model=gpt-4o&status=error&limit=50
Returns all configured alert rules and their current state.
Response:
[
{
"name": "hourly-cost-spike",
"metric": "total_cost",
"op": "gt",
"threshold": 10.00,
"window": "1h",
"last_evaluated": "2026-03-15T15:00:00.000Z",
"last_fired": "2026-03-15T14:00:00.000Z",
"current_value": 1.42,
"firing": false,
"silenced": false,
"silenced_until": null
}
]
Create a new alert rule at runtime without restarting the process.
Request body:
{
"name": "model-latency-alert",
"metric": "latency_p95",
"op": "gt",
"threshold": 4000,
"window": "30m",
"cooldown": "1h",
"min_spans": 10,
"filter": {"model": "gpt-4o"},
"delivery": {
"type": "http",
"url": "https://hooks.example.com/alerts"
}
}
Response:
HTTP 201 Created
{"rule_id": "alert_a1b2c3d4", "name": "model-latency-alert"}
Silence an alert rule for a duration.
Request body:
{"duration": "2h"}
Response:
HTTP 200 OK
{"silenced_until": "2026-03-15T17:00:00.000Z"}
Remove an alert rule. Only applies to rules created via the API; rules defined in code or config are restored on restart.
Response: HTTP 204 No Content
Health check endpoint. Returns 200 OK when the server is running and the metrics aggregator is operational.
Response:
{
"status": "ok",
"version": "0.4.0",
"uptime_s": 3612,
"span_count": 8941,
"transport": "file",
"queue_depth": 0
}
Returns 503 Service Unavailable if the aggregator is unhealthy (e.g., disk full when using FileTransport).
The HTTP server does not enforce authentication by default. To require a bearer token:
# tokentrace.yml
http_server:
addr: ":9090"
auth_token: "${TOKENTRACE_API_TOKEN}"
All requests must include Authorization: Bearer <token>. Requests without a valid token receive 401 Unauthorized.
/metrics/{name}.