HonestRoles Library Framework¶
honestroles is a Python PyPI package under the MIT license and is the canonical place for all job-description data processing logic.
- Own the data-processing domain. The library is responsible for ingestion-adjacent validation, cleaning, normalization, filtering, labeling, and rating of job-description data.
- Treat the source-data contract as input truth. All incoming datasets must follow the source-data contract, including required core fields and tolerant handling of extra fields.
- Keep schema references centralized.
All code should use
honestroles.schemaconstants instead of hard-coded column strings to avoid drift and support safe refactors. - Design for composable pipelines.
Public APIs should remain modular (
clean,filter,label,rate,io) so users can run full pipelines or individual stages without side effects. - Validate early, fail clearly. I/O boundaries must validate required columns and raise explicit, actionable errors when contract expectations are not met.
- Preserve backward compatibility in minor releases. Existing public functions, expected column names, and default behaviors should not break without a major version change and migration notes.
- Prioritize deterministic outputs. Transformations should be reproducible given the same input, with stable behavior across environments and runs.
- Keep unknown fields as pass-through data. Extra columns from upstream sources should be retained unless there is a documented reason to drop them.
- Enforce quality gates. Every behavior change should include tests, and docs should be updated when schema, API, or pipeline expectations change.
- Ship as a reusable package. Repository changes should maintain PyPI package readiness: clean module boundaries, documented public API, semantic versioning, and MIT licensing continuity.