Skip to content

Configure Pipeline

Configure stage behavior and runtime policy using a strict TOML schema.

When to use

Use this when changing filtering, scoring, ranking, or failure behavior.

Prerequisites

Steps

  1. Set input/output paths:
[input]
kind = "parquet"
path = "./examples/jobs_sample.parquet"

[output]
path = "./examples/jobs_scored.parquet"
  1. Configure stages:
[stages.filter]
enabled = true
remote_only = true
min_salary = 120000.0
required_keywords = ["python", "sql"]

[stages.rate]
enabled = true
completeness_weight = 0.7
quality_weight = 0.3

[stages.match]
enabled = true
top_k = 25
  1. Add source adapter mappings for non-canonical input schemas:
[input.adapter]
enabled = true
on_error = "null_warn"

[input.adapter.fields.location]
from = ["location_raw", "job_location"]
cast = "string"

[input.adapter.fields.remote]
from = ["remote_flag", "is_remote"]
cast = "bool"
  1. Set runtime policy:
[runtime]
fail_fast = false
random_seed = 42
  1. Validate before running:
$ honestroles config validate --pipeline pipeline.toml
  1. Define reliability thresholds in a separate policy file (optional but recommended):
# reliability.toml
min_rows = 500
required_columns = ["title", "description_text", "posted_at"]

[max_null_pct]
title = 5
description_text = 10

[freshness]
column = "posted_at"
max_age_days = 14

Run:

$ honestroles reliability check --pipeline-config pipeline.toml --policy reliability.toml --strict --format table

Expected result

Config validation succeeds and stage/runtime values appear in normalized JSON output.

Next steps