Skip to content

Filter

Filter

honestroles.filter provides composable predicates and a simple chaining utility for filtering job DataFrames.

Modules

  • __init__.py: filter_jobs orchestrator and re-exports.
  • chain.py: FilterChain for AND/OR predicate composition.
  • predicates.py: by_* predicate helpers (location, salary, skills, keywords).

Public API reference

filter_jobs(...) -> pd.DataFrame

filter_jobs(
    df: pd.DataFrame,
    *,
    cities: list[str] | None = None,
    regions: list[str] | None = None,
    countries: list[str] | None = None,
    remote_only: bool = False,
    min_salary: float | None = None,
    max_salary: float | None = None,
    currency: str | None = "USD",
    required_skills: list[str] | None = None,
    excluded_skills: list[str] | None = None,
    include_keywords: list[str] | None = None,
    exclude_keywords: list[str] | None = None,
    keyword_columns: list[str] | None = None,
    required_fields: list[str] | None = None,
    plugin_filters: list[str] | None = None,
    plugin_filter_kwargs: dict[str, dict[str, object]] | None = None,
    plugin_filter_mode: str = "and",
) -> pd.DataFrame

Builds a FilterChain in AND mode with by_location, by_salary, by_skills, by_keywords, and by_completeness. If plugin_filters are provided, registered filter plugins are applied after the built-in chain.

FilterChain

  • FilterChain(mode: str = "and"): Create a chain in "and" or "or" mode.
  • add(predicate: Predicate, **kwargs: object) -> FilterChain: Add a predicate.
  • apply(df: pd.DataFrame) -> pd.DataFrame: Apply all steps and return filtered rows.

Predicate is any callable that returns a pd.Series mask for a DataFrame.

Predicate helpers

  • by_location(...) -> pd.Series: Filters by city, region, country, and remote_flag.
  • by_salary(...) -> pd.Series: Filters by salary_min/salary_max and currency.
  • by_skills(...) -> pd.Series: Filters by required/excluded skills.
  • by_keywords(...) -> pd.Series: Filters by include/exclude terms in columns.
  • by_completeness(...) -> pd.Series: Filters by required field presence.

Usage examples

import honestroles as hr

df = hr.read_parquet("jobs_current.parquet")
df = hr.clean_jobs(df)
df = hr.filter_jobs(
    df,
    countries=["US"],
    regions=["California"],
    remote_only=True,
    min_salary=120_000,
    include_keywords=["python", "data"],
)
from honestroles.filter import FilterChain, by_location, by_keywords

chain = FilterChain(mode="and")
chain.add(by_location, countries=["CA"])
chain.add(by_keywords, include=["machine learning"])

filtered = chain.apply(df)

Design notes

  • FilterChain always resets index after filtering.
  • Predicates are defensive: if required columns are missing, they default to allowing all rows rather than raising.