Stage Contracts¶
Canonical stage behavior and data contracts.
Stage Order¶
Enabled stages execute in this fixed order:
cleanfilterlabelratematch
Source Data Contract¶
Runtime normalizes columns to include these names:
idtitlecompanylocationremotedescription_textdescription_htmlskillssalary_minsalary_maxapply_urlposted_at
Before normalization, runtime resolves source aliases into canonical names using:
- Declarative source adapter from
[input.adapter](runs first) - Built-in aliases:
location_raw -> location,remote_flag -> remote - Optional pipeline aliases from
[input.aliases]
Conflict policy:
- Canonical field wins.
- Adapter/alias conflicts are recorded in diagnostics.
Validation requires:
title- At least one of
description_textordescription_html
Stage Responsibilities¶
- Stage input/output object:
JobDataset clean: clean text/html and apply text-level policy such as dropping null titlesfilter: applyremote_only, salary threshold, keyword filters, then run filter pluginslabel: derive base labels, then run label pluginsrate: compute boundedrate_completeness,rate_quality, andrate_composite, then run rate pluginsmatch: compute boundedfit_score, sort descending, enforcetop_k, generateapplication_plan
Output Invariants¶
rate_*metrics are bounded to[0.0, 1.0].fit_scoreis bounded to[0.0, 1.0].fit_rankstarts at1and reflects descendingfit_score.application_planis aligned to ranked toptop_krows and returned asApplicationPlanEntry[].- Plugins must return a valid canonical
JobDatasetwith all canonical fields and canonical logical dtypes preserved.
Diagnostics Additions¶
Runtime diagnostics include input_aliasing:
{
"input_aliasing": {
"applied": {"location": "location_raw", "remote": "remote_flag"},
"conflicts": {"remote": 2},
"unresolved": ["skills", "salary_min", "salary_max"]
}
}
Runtime diagnostics also include input_adapter:
{
"input_adapter": {
"enabled": true,
"applied": {"remote": "remote_flag", "posted_at": "date_posted"},
"conflicts": {"remote": 2},
"coercion_errors": {"posted_at": 1},
"null_like_hits": {"location": 10},
"unresolved": ["salary_max"],
"error_samples": [
{
"field": "posted_at",
"source": "date_posted",
"value": "32/13/2024",
"reason": "date_parse_failed"
}
]
}
}