Runtime API¶

Public runtime API contracts for Python usage.

`HonestRolesRuntime`¶

Constructor entrypoint:

from honestroles import HonestRolesRuntime

runtime = HonestRolesRuntime.from_configs(
    pipeline_config_path="pipeline.toml",
    plugin_manifest_path="plugins.toml",  # optional
)

pipeline_config_path: str | Path, required
plugin_manifest_path: str | Path | None, optional

Execution:

run = runtime.run()

`PipelineRun`¶

run() returns PipelineRun with fields:

dataset: final JobDataset
diagnostics: RuntimeDiagnostics
application_plan: tuple[ApplicationPlanEntry, ...]

`JobDataset`¶

JobDataset is the strict canonical runtime stage I/O object.

to_polars(copy: bool = True) -> pl.DataFrame
row_count() -> int
columns() -> tuple[str, ...]
iter_records() -> Iterator[CanonicalJobRecord]
materialize_records(limit: int | None = None) -> list[CanonicalJobRecord]
validate() -> None
with_frame(frame) -> JobDataset
transform(fn) -> JobDataset
Runtime-produced and plugin-returned datasets must retain all canonical fields and canonical logical dtypes.

Notes:

to_polars(copy=True) is the explicit engine boundary and returns a clone by default.
rows() and select() are not part of the public JobDataset API.

Diagnostics Contract¶

RuntimeDiagnostics.to_dict() always includes:

input_path
stage_rows
plugin_counts
runtime
input_adapter
input_aliasing
final_rows

Diagnostics conditionally include:

output_path (when [output] is configured)
non_fatal_errors (when fail_fast = false and errors occur)

Determinism¶

The runtime seeds Python randomness from runtime.random_seed at run start. Fixed inputs/spec/plugins produce stable outputs.

Ingestion API¶

Use sync_source(...) to ingest one public ATS source into canonical parquet:

from honestroles import sync_source

result = sync_source(
    source="greenhouse",
    source_ref="stripe",
)

sync_source(...) -> IngestionResult fields include:

report: IngestionReport
output_parquet: resolved latest parquet path
report_file: resolved sync report path
raw_file: optional raw JSONL path (when write_raw=True)
snapshot_file: per-run snapshot parquet path
catalog_file: catalog parquet path
state_file: state file path written
rows_written: active latest row count written

Additive request controls:

timeout_seconds
max_retries
base_backoff_seconds
user_agent
quality_policy_file
strict_quality
merge_policy (updated_hash|first_seen|last_seen)
retain_snapshots
prune_inactive_days

Additive result/report fields include:

quality_status, quality_summary, quality_check_codes
key_field_completeness (company_non_null_pct, posted_at_non_null_pct, description_text_non_null_pct, location_or_remote_signal_pct)
stage_timings_ms, warnings
merge_policy, retained_snapshot_count, pruned_snapshot_count, pruned_inactive_count
quality_policy_source, quality_policy_hash

Validation-only ingestion API:

from honestroles import validate_ingestion_source

validation = validate_ingestion_source(
    source="greenhouse",
    source_ref="stripe",
    quality_policy_file="ingest_quality.toml",
    strict_quality=True,
)
print(validation.report.status, validation.rows_evaluated)

validate_ingestion_source(...) -> IngestionValidationResult writes a validation report and optional raw payload, but does not overwrite latest parquet.

Batch ingestion from manifest:

from honestroles import sync_sources_from_manifest

batch = sync_sources_from_manifest(manifest_path="ingest.toml", fail_fast=False)
print(batch.status, batch.total_sources, batch.fail_count)

sync_sources_from_manifest(...) -> BatchIngestionResult includes:

aggregate status/timing fields
per-source payloads under sources
aggregate totals (total_rows_written, total_fetched_count, total_request_count)
aggregate quality summary (quality_summary)
aggregate key field completeness (key_field_completeness)
report_file

Supported source values:

greenhouse
lever
ashby
workable

Recommendation API¶

Build API-ready retrieval artifacts:

from honestroles import build_retrieval_index

index = build_retrieval_index(
    input_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    policy_file="recommendation.toml",
)
print(index.index_id, index.index_dir)

Match jobs from an index:

from honestroles import match_jobs

matches = match_jobs(
    index_dir=index.index_dir,
    candidate_json="examples/candidate.json",
    top_k=25,
    include_excluded=True,
)
print(matches.eligible_count, len(matches.results))

Evaluate recommendation quality:

from honestroles import evaluate_relevance

evaluation = evaluate_relevance(
    index_dir=index.index_dir,
    golden_set="examples/recommend_golden_set.json",
    thresholds_file="recommend_eval.toml",
)
print(evaluation.status, evaluation.metrics)

Feedback primitives:

from honestroles import record_feedback_event, summarize_feedback

record_feedback_event(profile_id="jane_doe", job_id="12345", event="interviewed")
summary = summarize_feedback(profile_id="jane_doe")
print(summary.total_events, summary.weights)

NeonDB Publish API¶

Apply migrations:

from honestroles import migrate_neondb

result = migrate_neondb(
    database_url_env="NEON_DATABASE_URL",
    schema="honestroles_api",
)
print(result.status, result.migrations_applied)

Publish jobs + features:

from honestroles import publish_neondb_sync

result = publish_neondb_sync(
    database_url_env="NEON_DATABASE_URL",
    schema="honestroles_api",
    jobs_parquet="dist/ingest/greenhouse/stripe/jobs.parquet",
    index_dir="dist/recommend/index/<index_id>",
    sync_report="dist/ingest/greenhouse/stripe/sync_report.json",
    require_quality_pass=True,
    full_refresh=False,
)
print(result.status, result.batch_id, result.inserted_count)

Verify DB contract:

from honestroles import verify_neondb_contract

result = verify_neondb_contract(
    database_url_env="NEON_DATABASE_URL",
    schema="honestroles_api",
)
print(result.status, result.check_codes)