Go
stable
tokentrace
Trace every token.
go get github.com/greynewell/tokentrace
Token-level traces
Every inference call emits a structured trace: model, provider, prompt tokens, completion tokens, cost, latency, and custom attributes.
Quality alerts
Alert on eval score drops, not just HTTP errors. Know when your model degrades before users do.
Cost tracking
Per-call cost accounting. Attribute spend by caller, model, and workflow. Set budgets and enforce them.
Transport-agnostic
Write traces to files in development, HTTP in production, stdout for debugging. Same API everywhere.
Prometheus + Grafana
Export metrics to Prometheus. Pre-built Grafana dashboard for AI system observability.
Zero dependencies
Built on mist-go. No collector, exporter, or background process required.
Example
tokentrace
import "github.com/greynewell/tokentrace"
// Instrument an inference call
trace := tokentrace.Start("summarize-document")
resp, err := callModel(prompt)
trace.Record(tokentrace.Span{
Model: "gpt-4o",
PromptTokens: resp.Usage.PromptTokens,
CompTokens: resp.Usage.CompletionTokens,
LatencyMs: elapsed.Milliseconds(),
Cost: tokentrace.Cost("gpt-4o", resp.Usage),
})
trace.End()
# Query metrics
$ curl http://localhost:9090/metrics/cost?window=1h
{"total_usd": 0.42, "by_model": {"gpt-4o": 0.38, "gpt-4o-mini": 0.04}}
HTTP 200 doesn't mean the answer was correct. Standard APM has no signal for hallucinations, instruction drift, or quality regressions — the model returned something, and that's all your dashboards see.
tokentrace traces every inference call with the dimensions that matter: model, tokens, cost, latency, and — when you attach eval scores from [matchspec](/matchspec/) — quality. Metrics aggregate continuously. Alert rules fire when anything moves.
The transport is pluggable: JSONL file in development, HTTP endpoint in production, in-memory buffer in tests. No collector required.