Label
Label¶
honestroles.label adds derived labels to job postings using heuristic rules
and optional LLM classification.
Modules¶
__init__.py:label_jobsorchestrator and re-exports.heuristic.py: rule-based seniority, role category, and tech stack labeling.llm.py: LLM-based label classification using Ollama.
Public API reference¶
label_jobs(df: pd.DataFrame, *, use_llm: bool = False, model: str = "llama3", labels: list[str] | None = None, column: str = DESCRIPTION_TEXT, ollama_url: str = "http://localhost:11434", batch_size: int = 8, plugin_labelers: list[str] | None = None, plugin_labeler_kwargs: dict[str, dict[str, object]] | None = None) -> pd.DataFrame¶
Runs heuristic labeling (label_seniority, label_role_category,
label_tech_stack). If use_llm=True, appends label_with_llm using the
provided LLM args (model, labels, column, ollama_url, batch_size).
If plugin_labelers are provided, registered label
plugins are applied after built-in labeling.
label_seniority(df: pd.DataFrame, *, title_column: str = TITLE) -> pd.DataFrame¶
Adds a seniority column based on job title patterns
(intern, junior, mid, senior, staff, principal, lead, director,
vp, c_level).
label_role_category(df: pd.DataFrame, *, title_column: str = TITLE, description_column: str = DESCRIPTION_TEXT) -> pd.DataFrame¶
Adds a role_category column by scanning title and description keywords
(engineering, data, design, product, marketing, sales, operations,
finance, hr, legal, support).
label_tech_stack(df: pd.DataFrame, *, skills_column: str = SKILLS, description_column: str = DESCRIPTION_TEXT) -> pd.DataFrame¶
Adds a tech_stack column containing a sorted list of detected terms
from the configured _TECH_TERMS list.
label_with_llm(...) -> pd.DataFrame¶
label_with_llm(
df: pd.DataFrame,
*,
model: str = "llama3",
labels: list[str] | None = None,
column: str = DESCRIPTION_TEXT,
ollama_url: str = "http://localhost:11434",
batch_size: int = 8,
) -> pd.DataFrame
Uses OllamaClient to classify descriptions and writes llm_labels as a list.
If the column is missing or Ollama is unavailable, returns the input DataFrame.
Usage examples¶
import honestroles as hr
df = hr.read_parquet("jobs_current.parquet")
df = hr.clean_jobs(df)
df = hr.label_jobs(df, use_llm=False)
from honestroles.label import label_with_llm
df = label_with_llm(
df,
model="llama3",
labels=["engineering", "data", "product"],
ollama_url="http://localhost:11434",
)
Design notes¶
- Heuristic labeling uses predefined regex patterns and keyword sets; tune them
in
heuristic.pyif your dataset has domain-specific terms. DEFAULT_LABELSinllm.pyis used when no label list is provided.