Configuration

matchspec is configured via a matchspec.yml file in your project root. This file defines suites, references harness files, sets thresholds, and configures output behavior.

Config file discovery

matchspec run looks for matchspec.yml in the following order:

The path specified by --config flag
The path specified by the MATCHSPEC_CONFIG environment variable
./matchspec.yml (current working directory)
$HOME/.matchspec/config.yml (global user config)

The first file found is used. If no file is found, matchspec run exits with an error unless you pass a harness file directly as an argument.

Full schema

# matchspec.yml — complete example with all fields

version: 1  # required; must be 1

# Global defaults applied to all suites and harnesses unless overridden.
defaults:
  concurrency: 4          # parallel model calls per harness
  timeout_seconds: 30     # per-example model call timeout
  retries: 0              # number of retries on model call failure
  retry_delay_ms: 250     # delay between retries

# Output configuration.
output:
  dir: ".matchspec/results"   # where results files are written
  format: json                # "json", "junit", or "markdown"
  retain_days: 30             # delete results older than N days (0 = keep forever)

# Statistics configuration.
statistics:
  confidence_level: 0.95      # Wilson score CI confidence level
  use_lower_bound: false      # compare threshold to CI lower bound
  min_sample_size: 0          # minimum examples required before enforcing thresholds
  min_sample_action: warn     # "warn" or "fail"

# Suite definitions.
suites:
  - name: smoke                 # required; suite identifier
    description: "Fast smoke check on a small sample."
    harnesses:
      - ./evals/smoke/harness.yml
    tags: [smoke]               # only run examples with these tags
    thresholds:
      overall: 0.90             # default threshold for all graders in this suite
      exact_match: 0.85         # per-grader-name threshold (overrides overall)
      semantic_similarity: 0.80
    statistics:                 # per-suite statistics config (overrides global)
      min_sample_size: 10
      min_sample_action: warn

  - name: production-gate
    description: "Full eval suite required to pass before deployment."
    harnesses:
      - ./evals/summarization/harness.yml
      - ./evals/qa/harness.yml
      - ./evals/classification/harness.yml
    thresholds:
      overall: 0.80
    statistics:
      confidence_level: 0.95
      use_lower_bound: true
      min_sample_size: 50
      min_sample_action: fail

Field reference

Top-level

Field	Type	Default	Description
`version`	integer	—	Required. Must be `1`.
`defaults`	object	see below	Global defaults for all harnesses.
`output`	object	see below	Output file configuration.
`statistics`	object	see below	Statistical settings.
`suites`	array	—	Required. List of suite definitions.

defaults

Field	Type	Default	Description
`concurrency`	integer	`4`	Max parallel model calls per harness.
`timeout_seconds`	integer	`30`	Per-example timeout in seconds.
`retries`	integer	`0`	Retry count on model call failure.
`retry_delay_ms`	integer	`250`	Milliseconds between retries.

output

Field	Type	Default	Description
`dir`	string	`.matchspec/results`	Directory for results files. Created if absent.
`format`	string	`json`	Results file format: `json`, `junit`, `markdown`.
`retain_days`	integer	`0`	Delete results files older than N days. 0 keeps all.

statistics

Field	Type	Default	Description
`confidence_level`	float	`0.95`	Confidence level for Wilson score interval.
`use_lower_bound`	bool	`false`	Apply threshold to CI lower bound.
`min_sample_size`	integer	`0`	Minimum examples. 0 disables the check.
`min_sample_action`	string	`warn`	`"warn"` prints a warning. `"fail"` exits non-zero.

suites[]

Field	Type	Required	Description
`name`	string	yes	Suite identifier. Used in `--suite` flag and reports.
`description`	string	no	Human-readable description.
`harnesses`	array of strings	yes	Paths to harness YAML files.
`tags`	array of strings	no	Run only examples with these tags.
`thresholds`	object	no	Per-suite threshold overrides.
`statistics`	object	no	Per-suite statistics config.

suites[].thresholds

Field	Type	Default	Description
`overall`	float	`1.0`	Default threshold for all graders not explicitly listed.
`<grader_name>`	float	—	Per-grader-name threshold override.

Environment variable overrides

Any config value can be overridden with an environment variable. The pattern is MATCHSPEC_ followed by the dot-notation path, uppercased, with dots replaced by underscores:

Variable	Overrides
`MATCHSPEC_CONFIG`	Config file path
`MATCHSPEC_OUTPUT_DIR`	`output.dir`
`MATCHSPEC_OUTPUT_FORMAT`	`output.format`
`MATCHSPEC_DEFAULTS_CONCURRENCY`	`defaults.concurrency`
`MATCHSPEC_DEFAULTS_TIMEOUT_SECONDS`	`defaults.timeout_seconds`
`MATCHSPEC_DEFAULTS_RETRIES`	`defaults.retries`
`MATCHSPEC_STATISTICS_CONFIDENCE_LEVEL`	`statistics.confidence_level`
`MATCHSPEC_STATISTICS_USE_LOWER_BOUND`	`statistics.use_lower_bound`
`MATCHSPEC_API_KEY`	HTTP API authentication key
`MATCHSPEC_LOG_LEVEL`	Log verbosity: `debug`, `info`, `warn`, `error`

Environment variables take precedence over values in matchspec.yml. Command-line flags take precedence over environment variables.

Minimal config

The smallest valid matchspec.yml:

version: 1
suites:
  - name: default
    harnesses:
      - ./evals/harness.yml
    thresholds:
      overall: 0.80

Example: multiple environments

Use environment variables to vary thresholds between environments without multiple config files:

version: 1
suites:
  - name: production-gate
    harnesses:
      - ./evals/summarization/harness.yml
    thresholds:
      overall: 0.85

In CI before merging to main:

MATCHSPEC_STATISTICS_MIN_SAMPLE_SIZE=100 \
MATCHSPEC_STATISTICS_USE_LOWER_BOUND=true \
matchspec run --suite production-gate

In a fast smoke check on a PR:

matchspec run --suite smoke --tags smoke

← Previous HTTP API

Next → Go API