Source Data -> honestroles Data Contract (v1)¶
Status: Draft
Version: 1.0.0
Date: 2026-02-14
Producer: source data pipeline
Consumer: honestroles
Purpose¶
Define the minimum and expected payload shape that honestroles receives from source data.
Canonical Handoff Artifact¶
The contract artifact is jobs_current source data from the primary datastore (or a Parquet/JSON export generated from that same row shape).
honestroles should not treat reporting CSV exports as canonical contract input.
Core Required Fields (must exist, non-null)¶
These are the columns honestroles validates by default.
| Field | Type | Rules |
|---|---|---|
job_key |
string | Primary id, format company::source::job_id |
company |
string | Stable company identifier |
source |
string | ATS source (greenhouse, lever, ashby, workable, smartrecruiters, recruitee, teamtailor, personio) |
job_id |
string | ATS-native job id |
title |
string | Job title |
location_raw |
string | Raw location text from source |
apply_url |
string | Apply URL |
ingested_at |
timestamp/string | Ingestion timestamp |
content_hash |
string | Stable content fingerprint for change tracking |
Standard Optional Fields (known by honestroles)¶
If present, these should use the listed types:
team: string or nullremote_flag: boolean or nullemployment_type: string or nullposted_at: timestamp/string or nullupdated_at: timestamp/string or nulldescription_html: string or nulldescription_text: string or nullsalary_min: number or nullsalary_max: number or nullsalary_currency: string or nullsalary_interval: string or nullcity: string or nullregion: string or nullcountry: string or nullremote_type: string or nullskills: array[string] or nulllast_seen: timestamp/string or nullsalary_text: string or nulllanguages: array[string] or nullbenefits: array[string] or nullvisa_sponsorship: boolean or null
Extended Fields (source data may send; library should tolerate)¶
honestroles should accept and pass through unknown columns without failing validation. Current known extras from source data include:
application_deadlineapply_emailapply_url_canonicalbonus_textcontract_durationdepartmenteducation_levelemployment_statusequity_textexperience_years_maxexperience_years_minis_internshipjob_codejob_functionkeywordslatitudelongitudepostal_coderaw_dataremote_allowedremote_scoperequisition_idsalary_typesalary_unitsenioritystatetimezonework_arrangement
Serialization Rules¶
- Timestamps:
- DuckDB:
TIMESTAMP - JSON/Parquet: ISO-8601-compatible string or native datetime type
- Arrays (
skills,languages,benefits,keywords): preserve as arrays, not comma-delimited strings - Nulls: use
NULL/None/null, not empty string where possible - Encoding: UTF-8
Validation Behavior in honestroles¶
validate_source_data_contract(...) enforces:
- required columns exist
- required columns are non-null (default)
- known format/type checks (default), including:
- parseable timestamps
- valid
apply_url(http/https) - array columns as array-of-string values
- boolean columns (
remote_flag,visa_sponsorship) as booleans - salary metadata shape (
salary_currency,salary_interval, andsalary_min <= salary_max)
Format checks can be disabled with enforce_formats=False for ingestion transitions.
Compatibility Rules¶
- Producer (source data) may add columns without breaking v1.
- Consumer (
honestroles) must: - fail only when required core fields are missing
- ignore unknown extra columns
- Removing or renaming any core required field is a breaking change and requires
v2.
Recommended Producer Query¶
If exporting contract data from source data storage:
If a strict minimal contract payload is needed:
SELECT
job_key,
company,
source,
job_id,
title,
location_raw,
apply_url,
ingested_at,
content_hash
FROM jobs_current;
Non-Contract Output¶
Reporting CSV exports are not contract-safe for honestroles when they omit required core fields (job_key, source, job_id, ingested_at, content_hash).