Filter
Filter¶
honestroles.filter provides composable predicates and a simple chaining
utility for filtering job DataFrames.
Modules¶
__init__.py:filter_jobsorchestrator and re-exports.chain.py:FilterChainfor AND/OR predicate composition.predicates.py:by_*predicate helpers (location, salary, skills, keywords).
Public API reference¶
filter_jobs(...) -> pd.DataFrame¶
filter_jobs(
df: pd.DataFrame,
*,
cities: list[str] | None = None,
regions: list[str] | None = None,
countries: list[str] | None = None,
remote_only: bool = False,
min_salary: float | None = None,
max_salary: float | None = None,
currency: str | None = "USD",
required_skills: list[str] | None = None,
excluded_skills: list[str] | None = None,
include_keywords: list[str] | None = None,
exclude_keywords: list[str] | None = None,
keyword_columns: list[str] | None = None,
required_fields: list[str] | None = None,
plugin_filters: list[str] | None = None,
plugin_filter_kwargs: dict[str, dict[str, object]] | None = None,
plugin_filter_mode: str = "and",
) -> pd.DataFrame
Builds a FilterChain in AND mode with by_location, by_salary,
by_skills, by_keywords, and by_completeness. If plugin_filters are
provided, registered filter plugins are applied after the built-in chain.
FilterChain¶
FilterChain(mode: str = "and"): Create a chain in "and" or "or" mode.add(predicate: Predicate, **kwargs: object) -> FilterChain: Add a predicate.apply(df: pd.DataFrame) -> pd.DataFrame: Apply all steps and return filtered rows.
Predicate is any callable that returns a pd.Series mask for a DataFrame.
Predicate helpers¶
by_location(...) -> pd.Series: Filters bycity,region,country, andremote_flag.by_salary(...) -> pd.Series: Filters bysalary_min/salary_maxand currency.by_skills(...) -> pd.Series: Filters by required/excluded skills.by_keywords(...) -> pd.Series: Filters by include/exclude terms in columns.by_completeness(...) -> pd.Series: Filters by required field presence.
Usage examples¶
import honestroles as hr
df = hr.read_parquet("jobs_current.parquet")
df = hr.clean_jobs(df)
df = hr.filter_jobs(
df,
countries=["US"],
regions=["California"],
remote_only=True,
min_salary=120_000,
include_keywords=["python", "data"],
)
from honestroles.filter import FilterChain, by_location, by_keywords
chain = FilterChain(mode="and")
chain.add(by_location, countries=["CA"])
chain.add(by_keywords, include=["machine learning"])
filtered = chain.apply(df)
Design notes¶
FilterChainalways resets index after filtering.- Predicates are defensive: if required columns are missing, they default to allowing all rows rather than raising.