schemaflux processes all entities through 12 sequential passes before emitting any output. Each pass reads from the shared IR, writes to it, and returns control to the scheduler. Passes run in a fixed order because later passes depend on fields written by earlier ones.
The full pipeline:
0 parse read source files → entity stubs
1 slugify entity IDs → URL-safe slugs
2 sort apply weight and date ordering
3 enrich render markdown, derive section/depth
4 taxonomy group entities by tags/categories
5 relationships score entity similarity
6 graph build backlinks, parent/child, prev/next
7 url-resolve compute and resolve all URLs
8 schema-gen infer JSON Schema per entity type
9 validate enforce schema, check invariants
10 emit-html render HTML pages
11 emit-json serialize IR to JSON
12 emit-sitemap write sitemap.xml and RSS feed
Reads: source files on disk
Writes: ID, RawContent, Title, Description, Date, UpdatedAt, Weight, Draft, Layout, Permalink, Tags, Categories, Series, Frontmatter
The parse pass walks the input directory recursively, reads every .md file, splits the YAML frontmatter from the markdown body, and creates one Entity in the IR per file. Files with draft: true in their frontmatter are skipped unless the --drafts flag is passed to schemaflux build.
Configuration:
input:
dir: ./content # required; root directory to walk
include: # glob patterns to include (default: **/*.md)
- "**/*.md"
exclude: # glob patterns to exclude
- "_drafts/**"
- "README.md"
drafts: false # include draft entities
Reads: ID, Permalink, frontmatter slug
Writes: Slug
Converts the entity’s ID (relative file path without extension) into a URL-safe slug. The default strategy lowercases the path, replaces spaces and underscores with hyphens, and strips non-alphanumeric characters except hyphens and slashes. If a slug key is present in frontmatter, it overrides the computed value. If permalink is set, the slug is derived from the permalink’s final path segment.
Configuration:
passes:
slugify:
strategy: default # "default" | "preserve" | "ascii"
separator: "-" # character to use for word separation
Strategies:
| Strategy | Behavior |
|---|---|
default |
lowercase, spaces → hyphens, strip special chars |
preserve |
lowercase only; preserves underscores and dots |
ascii |
transliterate non-ASCII characters before slugifying |
Reads: Weight, Date, entity list order
Writes: normalized Weight, entity list order
Sorts the entity list by the combination of weight (ascending) and date (descending within the same weight group). Entities with weight: 0 are sorted by date. Entities with explicit weight values are placed before zero-weight entities. The sort is stable: entities with equal weight and date retain their original file system order.
Configuration:
passes:
sort:
primary: weight # "weight" | "date" | "title"
secondary: date # "date" | "title" | "weight"
direction: asc # "asc" | "desc"
Reads: RawContent, ID
Writes: Content (rendered HTML), Section, Depth, inferred Title (if missing)
Renders the markdown body to HTML using the built-in CommonMark renderer. If an entity’s Title field is empty after parsing, this pass extracts the first H1 heading from the markdown body and uses it. Section is set to the first path component of the entity’s ID relative to the input root. Depth is the number of path separators in the ID.
Configuration:
passes:
enrich:
markdown:
extensions:
- tables
- strikethrough
- autolink
- taskList
syntaxHighlight: true # wrap code blocks with highlight spans
safeHTML: false # strip raw HTML in markdown bodies
Reads: Tags, Categories, Series for all entities
Writes: Tags[].Slug, Categories[].Slug; creates synthetic taxonomy entities
For every unique tag value across all entities, this pass creates a synthetic entity representing the taxonomy term page. The taxonomy entity has its own URL (/tags/{slug}/), its Children field pre-populated with all entities bearing that tag, and Layout set to the configured taxonomy template. The same process runs for categories and series.
Taxonomy entities are inserted into the IR and processed by all subsequent passes like any other entity.
Configuration:
passes:
taxonomy:
tags:
enabled: true
urlPrefix: /tags/
layout: tag-list # template to use for tag pages
minCount: 1 # only create page if tag appears >= N times
categories:
enabled: false
urlPrefix: /categories/
layout: category-list
series:
enabled: false
urlPrefix: /series/
layout: series-list
Reads: Tags, Section, Content (for link extraction), Title for all entities
Writes: Related (top-N related entities per entity with scores)
Scores every entity pair for similarity using four signals:
The top-N entities by combined score are stored in Related for each entity. Entities with a score below minScore are excluded.
Configuration:
passes:
relationships:
topN: 5 # number of related entities to store
minScore: 0.1 # minimum similarity score to include
weights:
tagOverlap: 0.5
sectionMatch: 0.2
linkOverlap: 0.2
titleTokenOverlap: 0.1
Reads: entity list order, URL (preliminary), Content (for link extraction)
Writes: Backlinks, Parent, Children, Prev, Next
Builds the structural graph of the entity corpus. For every outbound link found in an entity’s content, if the link target resolves to another entity in the IR, the target’s Backlinks list gains an entry. Parent/child relationships are derived from URL prefix matching: entity /docs/overview/ is a child of /docs/. Prev and Next are set based on sort order within the same section.
Configuration:
passes:
graph:
backlinks: true
parentChild: true
prevNext: true
prevNextScope: section # "section" | "global" | "taxonomy"
Reads: Slug, Permalink, Section, Depth
Writes: URL (final), resolves all entity reference URLs in Backlinks, Related, Children, Parent
Computes the final canonical URL for every entity. The default URL strategy produces clean URLs: /blog/hello-world/ for a file at content/blog/hello-world.md. If permalink is set in frontmatter, that value is used verbatim. After URLs are set, all entity references throughout the IR (Backlinks, Related, etc.) are updated with the resolved URLs.
Configuration:
passes:
urlResolve:
strategy: clean # "clean" | "pretty" | "flat"
trailingSlash: true
URL strategies:
| Strategy | Example input | Example output |
|---|---|---|
clean |
blog/hello-world.md |
/blog/hello-world/ |
pretty |
blog/hello-world.md |
/blog/hello-world/index.html |
flat |
blog/hello-world.md |
/blog/hello-world.html |
Reads: Frontmatter, Section for all entities
Writes: Schema (per entity); emits schema files if configured
Groups entities by section, then for each section infers a JSON Schema from the observed frontmatter keys and value types. Fields present on every entity in the section are marked required. Fields present on some entities are marked optional. Type inference: string values → string, numeric values → number, boolean → boolean, lists → array, maps → object.
Configuration:
passes:
schemaGen:
enabled: true
emit: false # write inferred schemas to output dir
emitDir: ./_schemas/ # where to write schema files if emit: true
requiredThreshold: 1.0 # fraction of entities that must have field for it to be required
Reads: Schema, all entity fields
Writes: nothing; fails build on violations
Runs every entity against its section schema. Fails the build with a list of violations if any required field is missing. Also enforces cross-entity invariants: duplicate slugs, duplicate URLs, broken internal links, and any custom rules defined in the config.
Configuration:
passes:
validate:
schema: true # validate entities against inferred schema
uniqueSlugs: true # fail on duplicate slugs
uniqueURLs: true # fail on duplicate URLs
internalLinks: true # fail on broken internal links
rules: [] # custom validation rules (see below)
Custom rules:
passes:
validate:
rules:
- name: "require-description"
section: blog
require:
- description
- name: "date-in-past"
section: blog
assert: "entity.Date.Before(now)"
Reads: full IR Writes: HTML files to output directory
Renders every non-draft entity using its configured layout template. Taxonomy entities use the layout specified in the taxonomy pass configuration. Paginated list pages are generated for sections with paginate configured.
See Output Backends — HTML for the full template variable reference.
Configuration:
backends:
html:
enabled: true
outputDir: ./_site
templates:
dir: ./templates
default: base # layout to use when entity has no layout
pagination:
pageSize: 10
pageLayout: list
Reads: full IR Writes: JSON files to output directory
Serializes entities to JSON. By default, one JSON file is written per entity at the same URL path with a .json extension. Optionally, a single combined entities.json file containing all entities is emitted.
See Output Backends — JSON for the output schema.
Configuration:
backends:
json:
enabled: false
perEntity: false # write one JSON file per entity
combined: true # write a single entities.json
combinedFile: entities.json
fields: # fields to include (default: all)
- id
- title
- description
- url
- date
- tags
- content
Reads: full IR
Writes: sitemap.xml, feed.xml (RSS)
Writes an XML sitemap containing every non-draft entity with its URL and last-modified date. Optionally emits an RSS feed for a specified section.
Configuration:
backends:
sitemap:
enabled: true
file: sitemap.xml
changefreq: weekly # "always" | "hourly" | "daily" | "weekly" | "monthly" | "yearly" | "never"
priority: 0.5
exclude:
- /tags/**
rss:
enabled: false
section: blog
file: feed.xml
title: "My Site Feed"
description: "Recent posts"
limit: 20