CLI Reference

The matchspec CLI provides commands for running eval suites, generating reports, initializing projects, and validating configuration.

Installation

go install github.com/greynewell/matchspec/cmd/matchspec@latest

See Installation for full setup instructions.

matchspec run

Run one or more eval suites.

matchspec run [path...] [flags]

If path is omitted, matchspec run reads matchspec.yml from the current directory and runs all configured suites. You can also pass one or more explicit paths:

# Run all suites in matchspec.yml
matchspec run

# Run a single harness file directly
matchspec run ./evals/summarization/harness.yml

# Run all harnesses in a directory
matchspec run ./evals/summarization/

# Run multiple harnesses
matchspec run ./evals/summarization/ ./evals/qa/

Flags

Flag	Type	Default	Description
`--suite`, `-s`	string	—	Run only the named suite (as defined in `matchspec.yml`).
`--tags`	string	—	Comma-separated list of tags. Only run examples with at least one matching tag.
`--concurrency`, `-c`	integer	from config	Override concurrency for all harnesses.
`--timeout`	integer	from config	Override per-example timeout (seconds) for all harnesses.
`--output`, `-o`	string	`json`	Output format for results file: `json`, `junit`, `markdown`.
`--output-dir`	string	`.matchspec/results`	Directory to write results files.
`--no-write`	bool	false	Print results to stdout only; do not write results files.
`--show-all-failures`	bool	false	Print all failing examples, not just the first 5.
`--fail-fast`	bool	false	Stop after the first harness failure.
`--dry-run`	bool	false	Load and validate everything but don’t call the model.
`--config`	string	`./matchspec.yml`	Path to the config file.
`--verbose`, `-v`	bool	false	Print per-example results during the run.
`--quiet`, `-q`	bool	false	Suppress all output except the final verdict.

Examples

# Run with verbose per-example output
matchspec run --verbose

# Run only science-tagged examples
matchspec run --tags science

# Run the "smoke" suite only
matchspec run --suite smoke

# Write results in JUnit format for CI test reporting
matchspec run --output junit --output-dir ./test-results

# Dry run to verify config is parseable
matchspec run --dry-run

Exit codes

Code	Meaning
`0`	All suites passed all thresholds.
`1`	One or more suites failed.
`2`	Configuration error (malformed config, missing file, etc.).
`3`	Runtime error (network failure, all model calls failed, etc.).
`130`	Interrupted (SIGINT).

Exit code 0 is the only success code. Any non-zero exit code should fail a CI build.

matchspec report

Generate a report from existing results without re-running evals.

matchspec report [results-file] [flags]

# Report on the most recent results
matchspec report

# Report on a specific results file
matchspec report .matchspec/results/summarization-20260315-143022.json

# Generate a markdown summary
matchspec report --format markdown > report.md

# Compare two runs
matchspec report --compare .matchspec/results/run-001.json .matchspec/results/run-002.json

Flags

Flag	Type	Default	Description
`--format`, `-f`	string	`text`	Output format: `text`, `markdown`, `json`.
`--compare`	string	—	Path to a second results file to diff against.
`--show-passing`	bool	false	Include passing examples in the report (default: only show failures).

matchspec init

Initialize a matchspec project in the current directory.

matchspec init [flags]

Creates a starter matchspec.yml and (optionally) example harness and dataset files:

# Create just matchspec.yml
matchspec init

# Create matchspec.yml plus example harness and dataset
matchspec init --with-examples

Flags

Flag	Type	Default	Description
`--with-examples`	bool	false	Generate example harness and dataset files.
`--force`	bool	false	Overwrite existing `matchspec.yml`.

matchspec validate

Validate all configuration files without running evals.

matchspec validate [flags]

Checks:

matchspec.yml is present and parseable
All referenced harness files exist and are valid
All referenced dataset files exist and are valid
All grader types are known
All threshold values are in range

matchspec validate
# ✓ matchspec.yml found
# ✓ harness: summarization-v2 (./evals/summarization/harness.yml)
# ✓ harness: qa-v1 (./evals/qa/harness.yml)
# ✓ dataset: summarization-basic (120 examples)
# ✓ dataset: qa-basic (80 examples)
# ✓ 2 harnesses, 2 datasets, 4 graders — all valid

Validation errors:

matchspec validate
# ✓ matchspec.yml found
# ✗ harness: qa-v1: dataset file not found: ./evals/qa/dataset.yml
# ✗ harness: summarization-v2: unknown grader type "fuzzy_match"
# 2 errors found. Fix before running.
# exit code: 2

matchspec serve

Start the HTTP API server.

matchspec serve [flags]

See HTTP API for complete documentation.

matchspec serve --port 8090

Flags

Flag	Type	Default	Description
`--port`, `-p`	integer	`8090`	Port to listen on.
`--host`	string	`127.0.0.1`	Host to bind to. Use `0.0.0.0` to listen on all interfaces.
`--config`	string	`./matchspec.yml`	Path to the config file.

Environment variables

Variable	Description
`MATCHSPEC_CONFIG`	Path to `matchspec.yml`. Overrides `--config`.
`MATCHSPEC_OUTPUT_DIR`	Directory for results files. Overrides `--output-dir`.
`MATCHSPEC_CONCURRENCY`	Default concurrency for all harnesses. Overrides harness config.
`MATCHSPEC_LOG_LEVEL`	Log verbosity: `debug`, `info`, `warn`, `error`. Default: `info`.
`MATCHSPEC_API_KEY`	API key for the HTTP server’s authentication middleware.

Model API keys (e.g., OPENAI_API_KEY) are referenced in harness configs via api_key_env and read directly from the environment at runtime. They are not matchspec-specific environment variables — they just need to be set in the shell or CI environment where matchspec run executes.

← Previous Thresholds

Next → HTTP API