Skip to content

Merged Metrics Parquet Schema

This page defines the canonical StackSats dataset contract for the Google Drive merged_metrics*.parquet file.

Read these pages in order:

  1. Merged Metrics Data Guide
  2. Merged Metrics Parquet Schema
  3. Merged Metrics Taxonomy

Use Merged Metrics Data Guide for the user-facing answer to what data is available, and Merged Metrics Taxonomy for naming structure and namespace registries.

Canonical file link:

Profiled file in this repository:

  • merged_metrics_2026-03-15_04-29-57.parquet

Physical schema

This is the stable fact-table contract. Each row is one daily numeric observation for one metric key.

Column Polars dtype Required Nulls Meaning
day_utc Date yes no UTC calendar day for the metric observation.
metric String yes no Metric key/name (long-format keyspace).
value Float64 yes no Numeric metric value for (day_utc, metric).

Interpretation notes:

  • The long-format parquet is the source-of-truth dataset for StackSats.
  • Users access many daily derived BTC time series through metric keys in metric.
  • This parquet does not contain raw transaction rows, raw block rows, raw address ledgers, or intraday event data.
  • Semantic interpretation lives in data/brk_merged_metrics_catalog.json and Merged Metrics Taxonomy, not in extra parquet columns.

Dataset profile (current canonical snapshot)

These values were computed directly from merged_metrics_2026-03-15_04-29-57.parquet:

Property Value
Total rows 236,259,020
Distinct days 6,274
Distinct metrics 41,407
Day range 2009-01-03 to 2026-03-13
Null counts day_utc=0, metric=0, value=0
Finite check all value entries are finite in this snapshot

Runtime projection contract

StackSats runtime entrypoints (load_data, StrategyRunner, CLI strategy lifecycle commands) consume a normalized BRK-wide parquet with canonical columns such as date, price_usd, and optional strategy features.

That runtime parquet is a derived artifact from this canonical long-format dataset.

Minimum projection used by built-in strategy audit tooling:

  • market_cap
  • supply_btc
  • mvrv
  • adjusted_sopr
  • adjusted_sopr_7d_ema
  • realized_cap_growth_rate
  • market_cap_growth_rate

With:

  • price_usd = market_cap / supply_btc
  • day_utc renamed to date

Coverage of runtime-critical metrics (snapshot)

Metric Rows Min day Max day
market_cap 6,274 2009-01-03 2026-03-13
supply_btc 6,274 2009-01-03 2026-03-13
mvrv 6,273 2009-01-09 2026-03-13
adjusted_sopr 5,689 2010-08-16 2026-03-13
adjusted_sopr_7d_ema 5,689 2010-08-16 2026-03-13
realized_cap_growth_rate 5,324 2011-08-16 2026-03-13
market_cap_growth_rate 5,324 2011-08-16 2026-03-13

Validation query (reproducible)

python - <<'PY'
import polars as pl
from pathlib import Path

p = Path("merged_metrics_2026-03-15_04-29-57.parquet")
lf = pl.scan_parquet(p)

print(lf.collect_schema())
print(
    lf.select(
        pl.len().alias("rows"),
        pl.col("day_utc").n_unique().alias("unique_days"),
        pl.col("metric").n_unique().alias("unique_metrics"),
        pl.col("day_utc").min().alias("min_day"),
        pl.col("day_utc").max().alias("max_day"),
        pl.col("day_utc").null_count().alias("null_day_utc"),
        pl.col("metric").null_count().alias("null_metric"),
        pl.col("value").null_count().alias("null_value"),
    ).collect()
)
PY