Import paths:
github.com/greynewell/mist-go/healthgithub.com/greynewell/mist-go/lifecycleThe health package provides HTTP health check handlers for Kubernetes probes and load balancer checks. The lifecycle package handles graceful startup and shutdown: signal handling, in-flight work draining, and cleanup hook registration. Every MIST tool uses both.
h := health.New("matchspec", "1.0.0")
New takes the tool name and version. These appear in every response body.
A CheckFunc is any function that returns nil for healthy or an error for unhealthy. Register named checks that run during readiness probes:
// Check that the inference backend is reachable.
h.AddCheck("infermux", func() error {
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
return pingInfermux(ctx)
})
// Check that the checkpoint directory is writable.
h.AddCheck("checkpoint-dir", func() error {
path := filepath.Join(cfg.CheckpointDir, ".healthcheck")
if err := os.WriteFile(path, []byte("ok"), 0600); err != nil {
return fmt.Errorf("checkpoint dir not writable: %w", err)
}
os.Remove(path)
return nil
})
mux := http.NewServeMux()
mux.Handle("GET /healthz", h.Liveness())
mux.Handle("GET /readyz", h.Readiness())
/healthz)The liveness handler always returns 200 OK while the process is running. It does not run dependency checks. Use this for Kubernetes livenessProbe and load balancer pings.
{
"status": "ok",
"tool": "matchspec",
"version": "1.0.0",
"uptime": "3h42m18s"
}
/readyz)The readiness handler runs all registered CheckFunc functions. If all pass, it returns 200 OK. If any fail, it returns 503 Service Unavailable. Use this for Kubernetes readinessProbe — it tells the scheduler when the pod is ready to accept traffic.
{
"status": "ok",
"tool": "matchspec",
"version": "1.0.0",
"uptime": "3h42m18s",
"checks": {
"infermux": "ok",
"checkpoint-dir": "ok"
}
}
On failure:
{
"status": "degraded",
"tool": "matchspec",
"version": "1.0.0",
"uptime": "3h42m18s",
"checks": {
"infermux": "dial tcp: connection refused",
"checkpoint-dir": "ok"
}
}
For controlled rollouts, you can temporarily mark a tool as not ready:
// Take the tool out of rotation before maintenance.
h.SetReady(false)
// ... do maintenance work ...
h.SetReady(true)
SetReady(false) causes the readiness endpoint to return 503 without running checks, allowing you to drain traffic before an operation that would affect service quality.
Run is the entry point for all MIST tools. It wraps your main logic with signal handling and graceful shutdown:
func main() {
err := lifecycle.Run(func(ctx context.Context) error {
// Your application logic here.
// ctx is cancelled when SIGTERM or SIGINT is received.
return server.ListenAndServe()
})
if err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Printf("exit: %v", err)
os.Exit(1)
}
}
When Run is called, it:
SIGTERM or SIGINTDrainGroup)OnShutdown)Panics in your function are recovered and returned as errors.
OnShutdown registers a function to run during the shutdown phase. Hooks are run in reverse registration order (LIFO, like defer):
lifecycle.Run(func(ctx context.Context) error {
db, err := openDatabase(cfg.DSN)
if err != nil {
return err
}
lifecycle.OnShutdown(ctx, func() error {
return db.Close()
})
cache := openCache()
lifecycle.OnShutdown(ctx, func() error {
return cache.Flush()
})
// On shutdown: cache.Flush runs first, then db.Close.
return server.ListenAndServe()
})
This mirrors the defer ordering convention: register resources in the order you open them, and they close in reverse.
DrainGroup returns a *sync.WaitGroup that lifecycle will wait on before running shutdown hooks. Use it to track in-flight work that must complete before cleanup:
lifecycle.Run(func(ctx context.Context) error {
dg := lifecycle.DrainGroup(ctx)
for msg := range incoming {
dg.Add(1)
go func(m *protocol.Message) {
defer dg.Done()
processMessage(m)
}(msg)
}
return nil
})
On shutdown, lifecycle.Run waits for all drain group WaitGroups to reach zero before running shutdown hooks. The default drain timeout is 15 seconds.
err := lifecycle.Run(
func(ctx context.Context) error {
return server.ListenAndServe()
},
lifecycle.WithDrainTimeout(30*time.Second), // wait up to 30s for in-flight work
lifecycle.WithShutdownTimeout(15*time.Second), // run hooks within 15s
)
If the drain timeout is exceeded, Run logs a warning, proceeds to shutdown hooks, and returns the timeout error. If the shutdown hook timeout is exceeded, hooks are interrupted.
A complete MIST tool main function:
func main() {
var cfg AppConfig
if err := config.Load("matchspec.toml", "MATCHSPEC", &cfg); err != nil {
log.Fatal(err)
}
log := logging.New("matchspec", logging.LevelInfo)
reg := metrics.NewRegistry()
h := health.New("matchspec", version)
err := lifecycle.Run(func(ctx context.Context) error {
// Set up transport.
inferTr := transport.NewHTTP(cfg.Infer.URL)
lifecycle.OnShutdown(ctx, func() error { return inferTr.Close() })
// Register readiness checks.
h.AddCheck("infermux", func() error { return pingInfermux(inferTr) })
// Set up HTTP server.
srv := server.New(cfg.Server.Addr)
srv.Handle("GET /healthz", h.Liveness())
srv.Handle("GET /readyz", h.Readiness())
srv.Handle("GET /metricsz", reg.Handler())
lifecycle.OnShutdown(ctx, func() error { return srv.Shutdown(ctx) })
log.Info(ctx, "starting", "addr", cfg.Server.Addr)
return srv.ListenAndServe()
})
if err != nil {
log.Error(context.Background(), "exit", "error", err)
os.Exit(1)
}
}