Skip to content

stacksats.eda

stacksats.eda

Public EDA helpers for canonical merged_metrics parquet analysis.

MergedMetricsDataset(parquet_path: Path, _lazyframe: pl.LazyFrame, catalog: MetricCatalog) dataclass

Lazy-first wrapper over canonical merged_metrics parquet data.

available_metrics() -> list[str]

Return sorted metric names present in the current filtered dataset.

collect() -> pl.DataFrame

Collect the current lazy dataset into an eager frame.

filter_dates(start: str | dt.date | dt.datetime | None = None, end: str | dt.date | dt.datetime | None = None) -> MergedMetricsDataset

Return a dataset filtered to the inclusive date window.

filter_metrics(*, metrics: Sequence[str] | None = None, prefixes: Sequence[str] | None = None, regex: str | None = None, families: Sequence[str] | None = None, categories: Sequence[str] | None = None) -> MergedMetricsDataset

Return a dataset filtered by unioned metric selectors.

Filter the dataset using catalog text search.

head(n: int = 10) -> pl.DataFrame

Collect the first n rows from the current filtered dataset.

lazy() -> pl.LazyFrame

Return the underlying lazy Polars frame.

metric_counts() -> pl.DataFrame

Return per-metric row counts for the current filtered dataset.

metric_coverage(metrics: Sequence[str] | None = None) -> pl.DataFrame

Return observed coverage within the current dataset window.

metric_series(metric: str, *, error_if_empty: bool = False) -> pl.DataFrame

Return one metric as a sorted long-format eager frame.

pivot_wide(metrics: Sequence[str] | None = None, *, fill_null: float | None = None) -> pl.DataFrame

Pivot the current dataset to a day-indexed wide frame.

sample(n: int = 10, *, seed: int | None = 0) -> pl.DataFrame

Collect a small sample from the current filtered dataset.

This samples by row index so only the selected rows are materialized.

summary() -> dict[str, object]

Return high-level dataset summary values.

MetricCatalog(_frame: pl.DataFrame) dataclass

Catalog wrapper for merged-metrics metadata.

categories() -> list[str]

Return sorted unique access categories.

coverage(metrics: Sequence[str] | None = None) -> pl.DataFrame

Return coverage metadata for the selected metrics.

describe_metric(metric: str) -> dict[str, object]

Return one metric's metadata as a structured dictionary.

families() -> list[str]

Return sorted unique family names.

filter(*, metrics: Sequence[str] | None = None, prefixes: Sequence[str] | None = None, regex: str | None = None, families: Sequence[str] | None = None, categories: Sequence[str] | None = None) -> pl.DataFrame

Filter catalog rows using union semantics across selectors.

frame() -> pl.DataFrame

Return a defensive copy of the catalog frame.

metrics() -> list[str]

Return sorted unique metric names.

search(text: str) -> pl.DataFrame

Search common catalog text fields with case-insensitive substring matching.

suggest_metrics(query: str, *, limit: int = 10) -> pl.DataFrame

Return ranked metric suggestions for a likely intended query.

summary() -> dict[str, object]

Return high-level catalog summary values.

load_metric_catalog() -> MetricCatalog cached

Load the packaged merged-metrics catalog.

open_merged_metrics(parquet_path: str | Path | None = None) -> MergedMetricsDataset

Open the canonical merged_metrics parquet as a lazy-first dataset.