HTTP API

matchspec includes an HTTP API for triggering eval runs and retrieving results programmatically. Use it to integrate matchspec with pipelines that can’t run go install, or to run evals asynchronously from a service.

Starting the server

matchspec serve --port 8090

The server binds to 127.0.0.1:8090 by default. To listen on all interfaces (e.g., inside a container):

matchspec serve --host 0.0.0.0 --port 8090

Authentication

Protect the API with a bearer token by setting MATCHSPEC_API_KEY:

export MATCHSPEC_API_KEY="your-secret-token"
matchspec serve

All requests must include the token in the Authorization header:

Authorization: Bearer your-secret-token

Requests without a valid token receive 401 Unauthorized. If MATCHSPEC_API_KEY is not set, authentication is disabled (all requests are accepted). Do not expose an unauthenticated server to the public internet.

POST /run

Start an eval run. Returns immediately with a run ID; the run executes asynchronously.

Request

POST /run
Content-Type: application/json
Authorization: Bearer <token>

{
  "suite": "summarization",
  "tags": ["science", "ml"],
  "concurrency": 8,
  "config": "./matchspec.yml"
}

Field	Type	Required	Description
`suite`	string	no	Name of the suite to run. Runs all suites if omitted.
`tags`	array of strings	no	Filter examples by tags.
`concurrency`	integer	no	Override concurrency for this run.
`config`	string	no	Path to matchspec.yml. Uses server default if omitted.

Response

{
  "run_id": "run-20260315-143022-a3f7",
  "status": "running",
  "suite": "summarization",
  "started_at": "2026-03-15T14:30:22Z",
  "results_url": "/results/run-20260315-143022-a3f7"
}

Field	Type	Description
`run_id`	string	Unique identifier for this run.
`status`	string	`"running"`, `"passed"`, `"failed"`, `"error"`
`suite`	string	Name of the suite being run.
`started_at`	string	ISO 8601 timestamp.
`results_url`	string	URL to poll for results.

Status codes

Code	Meaning
`202 Accepted`	Run started.
`400 Bad Request`	Invalid request body.
`401 Unauthorized`	Missing or invalid API key.
`404 Not Found`	Named suite not found in config.

GET /results/:id

Retrieve the status and results of a run.

Request

GET /results/run-20260315-143022-a3f7
Authorization: Bearer <token>

Response (running)

{
  "run_id": "run-20260315-143022-a3f7",
  "status": "running",
  "suite": "summarization",
  "started_at": "2026-03-15T14:30:22Z",
  "progress": {
    "total": 120,
    "completed": 47,
    "percent": 39.2
  }
}

Response (completed)

{
  "run_id": "run-20260315-143022-a3f7",
  "status": "passed",
  "suite": "summarization",
  "started_at": "2026-03-15T14:30:22Z",
  "finished_at": "2026-03-15T14:31:05Z",
  "duration_seconds": 43,
  "verdict": "PASS",
  "grader_results": [
    {
      "name": "exact_match",
      "score": 0.74,
      "threshold": 0.70,
      "passed": true,
      "n": 120,
      "ci_lower": 0.651,
      "ci_upper": 0.820
    },
    {
      "name": "semantic_similarity",
      "score": 0.91,
      "threshold": 0.85,
      "passed": true,
      "n": 120,
      "ci_lower": 0.851,
      "ci_upper": 0.950
    }
  ],
  "examples": [
    {
      "id": "ex-001",
      "input": "Summarize in one sentence: ...",
      "expected": "Researchers reduced training compute by 40%.",
      "output": "Researchers cut neural net training compute by 40% using structured pruning.",
      "scores": {
        "exact_match": 0.0,
        "semantic_similarity": 0.94
      },
      "passed": true,
      "tags": ["science", "ml"]
    }
  ],
  "model_errors": 0
}

Response (failed)

Same structure as completed, but with "status": "failed" and "verdict": "FAIL". Failing graders have "passed": false.

Status codes

Code	Meaning
`200 OK`	Run found. Check `status` field for current state.
`401 Unauthorized`	Missing or invalid API key.
`404 Not Found`	Run ID not found.

GET /suites

List all suites defined in the server’s config.

Request

GET /suites
Authorization: Bearer <token>

Response

{
  "suites": [
    {
      "name": "summarization",
      "harnesses": ["summarization-v2"],
      "thresholds": {
        "overall": 0.80
      }
    },
    {
      "name": "qa",
      "harnesses": ["qa-v1"],
      "thresholds": {
        "overall": 0.85
      }
    }
  ]
}

GET /results

List recent run results.

Request

GET /results?suite=summarization&limit=10
Authorization: Bearer <token>

Query parameters

Parameter	Type	Default	Description
`suite`	string	—	Filter by suite name.
`status`	string	—	Filter by status: `passed`, `failed`, `running`, `error`.
`limit`	integer	`20`	Maximum number of results to return.
`offset`	integer	`0`	Pagination offset.

Response

{
  "runs": [
    {
      "run_id": "run-20260315-143022-a3f7",
      "status": "passed",
      "suite": "summarization",
      "started_at": "2026-03-15T14:30:22Z",
      "finished_at": "2026-03-15T14:31:05Z",
      "verdict": "PASS"
    }
  ],
  "total": 42,
  "limit": 10,
  "offset": 0
}

GET /health

Liveness check. Returns 200 if the server is running.

Request

GET /health

Response

{
  "status": "ok",
  "version": "0.3.1"
}

No authentication required.

Running as a daemon

To run the matchspec server as a long-lived daemon in a Docker container:

FROM golang:1.22-alpine AS builder
WORKDIR /app
RUN go install github.com/greynewell/matchspec/cmd/matchspec@latest

FROM alpine:3.19
COPY --from=builder /root/go/bin/matchspec /usr/local/bin/matchspec
COPY matchspec.yml /app/matchspec.yml
COPY evals/ /app/evals/

WORKDIR /app
EXPOSE 8090
ENV MATCHSPEC_API_KEY=""
CMD ["matchspec", "serve", "--host", "0.0.0.0", "--port", "8090"]

Build and run:

docker build -t matchspec-server .
docker run -p 8090:8090 \
  -e MATCHSPEC_API_KEY="your-token" \
  -e OPENAI_API_KEY="sk-..." \
  matchspec-server

Polling for results

Since runs are asynchronous, poll /results/:id until status is no longer "running":

# Trigger a run
RUN_ID=$(curl -s -X POST http://localhost:8090/run \
  -H "Authorization: Bearer $MATCHSPEC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"suite": "summarization"}' | jq -r .run_id)

echo "Started run: $RUN_ID"

# Poll until complete
while true; do
  STATUS=$(curl -s "http://localhost:8090/results/$RUN_ID" \
    -H "Authorization: Bearer $MATCHSPEC_API_KEY" | jq -r .status)
  echo "Status: $STATUS"
  if [ "$STATUS" != "running" ]; then break; fi
  sleep 5
done

# Check verdict
VERDICT=$(curl -s "http://localhost:8090/results/$RUN_ID" \
  -H "Authorization: Bearer $MATCHSPEC_API_KEY" | jq -r .verdict)

echo "Verdict: $VERDICT"
[ "$VERDICT" = "PASS" ] || exit 1

← Previous CLI Reference

Next → Configuration