stacksats.eda¶
stacksats.eda
¶
Public EDA helpers for canonical merged_metrics parquet analysis.
MergedMetricsDataset(parquet_path: Path, _lazyframe: pl.LazyFrame, catalog: MetricCatalog)
dataclass
¶
Lazy-first wrapper over canonical merged_metrics parquet data.
available_metrics() -> list[str]
¶
Return sorted metric names present in the current filtered dataset.
collect() -> pl.DataFrame
¶
Collect the current lazy dataset into an eager frame.
filter_dates(start: str | dt.date | dt.datetime | None = None, end: str | dt.date | dt.datetime | None = None) -> MergedMetricsDataset
¶
Return a dataset filtered to the inclusive date window.
filter_metrics(*, metrics: Sequence[str] | None = None, prefixes: Sequence[str] | None = None, regex: str | None = None, families: Sequence[str] | None = None, categories: Sequence[str] | None = None) -> MergedMetricsDataset
¶
Return a dataset filtered by unioned metric selectors.
filter_search(query: str) -> MergedMetricsDataset
¶
Filter the dataset using catalog text search.
head(n: int = 10) -> pl.DataFrame
¶
Collect the first n rows from the current filtered dataset.
lazy() -> pl.LazyFrame
¶
Return the underlying lazy Polars frame.
metric_counts() -> pl.DataFrame
¶
Return per-metric row counts for the current filtered dataset.
metric_coverage(metrics: Sequence[str] | None = None) -> pl.DataFrame
¶
Return observed coverage within the current dataset window.
metric_series(metric: str, *, error_if_empty: bool = False) -> pl.DataFrame
¶
Return one metric as a sorted long-format eager frame.
pivot_wide(metrics: Sequence[str] | None = None, *, fill_null: float | None = None) -> pl.DataFrame
¶
Pivot the current dataset to a day-indexed wide frame.
sample(n: int = 10, *, seed: int | None = 0) -> pl.DataFrame
¶
Collect a small sample from the current filtered dataset.
This samples by row index so only the selected rows are materialized.
summary() -> dict[str, object]
¶
Return high-level dataset summary values.
MetricCatalog(_frame: pl.DataFrame)
dataclass
¶
Catalog wrapper for merged-metrics metadata.
categories() -> list[str]
¶
Return sorted unique access categories.
coverage(metrics: Sequence[str] | None = None) -> pl.DataFrame
¶
Return coverage metadata for the selected metrics.
describe_metric(metric: str) -> dict[str, object]
¶
Return one metric's metadata as a structured dictionary.
families() -> list[str]
¶
Return sorted unique family names.
filter(*, metrics: Sequence[str] | None = None, prefixes: Sequence[str] | None = None, regex: str | None = None, families: Sequence[str] | None = None, categories: Sequence[str] | None = None) -> pl.DataFrame
¶
Filter catalog rows using union semantics across selectors.
frame() -> pl.DataFrame
¶
Return a defensive copy of the catalog frame.
metrics() -> list[str]
¶
Return sorted unique metric names.
search(text: str) -> pl.DataFrame
¶
Search common catalog text fields with case-insensitive substring matching.
suggest_metrics(query: str, *, limit: int = 10) -> pl.DataFrame
¶
Return ranked metric suggestions for a likely intended query.
summary() -> dict[str, object]
¶
Return high-level catalog summary values.
load_metric_catalog() -> MetricCatalog
cached
¶
Load the packaged merged-metrics catalog.
open_merged_metrics(parquet_path: str | Path | None = None) -> MergedMetricsDataset
¶
Open the canonical merged_metrics parquet as a lazy-first dataset.