Configure Pipeline¶

Configure stage behavior and runtime policy using a strict TOML schema.

When to use¶

Use this when changing filtering, scoring, ranking, or failure behavior.

Prerequisites¶

A valid pipeline file (see First Pipeline Config)

Steps¶

Set input/output paths:

[input]
kind = "parquet"
path = "./examples/jobs_sample.parquet"

[output]
path = "./examples/jobs_scored.parquet"

Configure stages:

[stages.filter]
enabled = true
remote_only = true
min_salary = 120000.0
required_keywords = ["python", "sql"]

[stages.rate]
enabled = true
completeness_weight = 0.7
quality_weight = 0.3

[stages.match]
enabled = true
top_k = 25

Add source adapter mappings for non-canonical input schemas:

[input.adapter]
enabled = true
on_error = "null_warn"

[input.adapter.fields.location]
from = ["location_raw", "job_location"]
cast = "string"

[input.adapter.fields.remote]
from = ["remote_flag", "is_remote"]
cast = "bool"

Set runtime policy:

[runtime]
fail_fast = false
random_seed = 42

Validate before running:

$ honestroles config validate --pipeline pipeline.toml

Define reliability thresholds in a separate policy file (optional but recommended):

# reliability.toml
min_rows = 500
required_columns = ["title", "description_text", "posted_at"]

[max_null_pct]
title = 5
description_text = 10

[freshness]
column = "posted_at"
max_age_days = 14

Run:

$ honestroles reliability check --pipeline-config pipeline.toml --policy reliability.toml --strict --format table

Expected result¶

Config validation succeeds and stage/runtime values appear in normalized JSON output.

Next steps¶

Full field-level schema: Pipeline Config Schema
Reliability threshold schema: Reliability Policy Schema
Error handling behavior: Non-Fail-Fast and Recovery