Go stable

tokentrace

Trace every token.

go get github.com/greynewell/tokentrace

Token-level traces

Every inference call emits a structured trace: model, provider, prompt tokens, completion tokens, cost, latency, and custom attributes.

Quality alerts

Alert on eval score drops, not just HTTP errors. Know when your model degrades before users do.

Cost tracking

Per-call cost accounting. Attribute spend by caller, model, and workflow. Set budgets and enforce them.

Transport-agnostic

Write traces to files in development, HTTP in production, stdout for debugging. Same API everywhere.

Prometheus + Grafana

Export metrics to Prometheus. Pre-built Grafana dashboard for AI system observability.

Zero dependencies

Built on mist-go. No collector, exporter, or background process required.

Example

tokentrace
import "github.com/greynewell/tokentrace" // Instrument an inference call trace := tokentrace.Start("summarize-document") resp, err := callModel(prompt) trace.Record(tokentrace.Span{ Model: "gpt-4o", PromptTokens: resp.Usage.PromptTokens, CompTokens: resp.Usage.CompletionTokens, LatencyMs: elapsed.Milliseconds(), Cost: tokentrace.Cost("gpt-4o", resp.Usage), }) trace.End() # Query metrics $ curl http://localhost:9090/metrics/cost?window=1h {"total_usd": 0.42, "by_model": {"gpt-4o": 0.38, "gpt-4o-mini": 0.04}}
HTTP 200 doesn't mean the answer was correct. Standard APM has no signal for hallucinations, instruction drift, or quality regressions — the model returned something, and that's all your dashboards see. tokentrace traces every inference call with the dimensions that matter: model, tokens, cost, latency, and — when you attach eval scores from [matchspec](/matchspec/) — quality. Metrics aggregate continuously. Alert rules fire when anything moves. The transport is pluggable: JSONL file in development, HTTP endpoint in production, in-memory buffer in tests. No collector required.