BRK Data Source (Bitcoin Research Kit Project + Canonical Merged Metrics + Runtime Projection)¶

StackSats supports the Bitcoin Research Kit (BRK) project as the upstream source ecosystem for its canonical data workflow. Official BRK project links:

Support boundary:

StackSats supports BRK-derived canonical data workflows and documents BRK as the upstream project.
StackSats remains a Python package with its own stable public API and CLI surface.
StackSats does not promise Rust crate compatibility, crate re-exports, or subcrate-by-subcrate feature parity.

Within StackSats docs, "BRK" refers to three related but distinct things:

the upstream Bitcoin Research Kit (BRK) project
the canonical BRK long-format merged_metrics*.parquet source dataset consumed by StackSats workflows
the StackSats runtime-compatible derived BRK-wide parquet built from that canonical source dataset

The StackSats-consumed canonical BRK source dataset is the long-format Google Drive parquet: merged_metrics*.parquet.

Canonical BRK source artifact link:

https://drive.google.com/file/d/1jKRRU7l9kOMdGI_hIJGg02X3jWTMPJsw/view?usp=sharing

Canonical schema page:

Snapshot scale (current canonical repo snapshot)¶

The current canonical snapshot is large enough to support long-horizon strategy research, not just toy examples:

236,259,020 rows
6,274 daily observations
41,407 distinct metric keys
284 top-level metric families
coverage from 2009-01-03 to 2026-03-13

Canonical merged_metrics schema¶

The canonical parquet has exactly:

day_utc (Date)
metric (String)
value (Float64)

This canonical BRK parquet is the source-of-truth dataset for StackSats documentation and data workflow. The physical long-format schema, user-facing access guide, and semantic metric taxonomy are documented separately so new users can first understand what data they can access before diving into naming details and projection mechanics.

Runtime ingestion contract¶

Runtime APIs are strict and deterministic:

runtime env var: STACKSATS_ANALYTICS_PARQUET
managed default path when env var is unset: ~/.stacksats/data/bitcoin_analytics.parquet
legacy local fallback path: ./bitcoin_analytics.parquet
runtime does not auto-download data
runtime parquet ingestion is lazy-first (scan_parquet) and only collects once the eager execution boundary needs a concrete frame
framework loaders retain pre-start history by default for feature warmup; scoring windows still respect requested start/end bounds

Runtime expects a StackSats runtime-compatible BRK-wide parquet (for example columns like date, price_usd, mvrv, and optional overlay features).

Framework-owned feature materialization is also lazy-first. Providers compose Polars LazyFrame pipelines and the runner/registry collect once after joining the observed feature set for eager strategy execution.

That BRK-wide parquet is a StackSats-managed derived artifact from canonical BRK merged_metrics. For direct exploration of the canonical long-format parquet, use the public stacksats.eda API instead of the runtime loader path.

Current minimal projection for built-in strategy audit tooling:

market_cap
supply_btc
mvrv
adjusted_sopr
adjusted_sopr_7d_ema
realized_cap_growth_rate
market_cap_growth_rate

With:

price_usd = market_cap / supply_btc
rename day_utc to date

Derive runtime parquet from canonical merged_metrics¶

python - <<'PY'
import polars as pl
from pathlib import Path

src = Path("merged_metrics_2026-03-15_04-29-57.parquet")
dst = Path("bitcoin_analytics.parquet")

metrics = [
    "market_cap",
    "supply_btc",
    "mvrv",
    "adjusted_sopr",
    "adjusted_sopr_7d_ema",
    "realized_cap_growth_rate",
    "market_cap_growth_rate",
]

(
    pl.scan_parquet(src)
    .filter(pl.col("metric").is_in(metrics))
    .select("day_utc", "metric", "value")
    .collect()
    .pivot(values="value", index="day_utc", on="metric")
    .with_columns((pl.col("market_cap") / pl.col("supply_btc")).alias("price_usd"))
    .rename({"day_utc": "date"})
    .select(
        "date",
        "price_usd",
        "mvrv",
        "adjusted_sopr",
        "adjusted_sopr_7d_ema",
        "realized_cap_growth_rate",
        "market_cap_growth_rate",
    )
    .filter(pl.col("price_usd").is_finite() & (pl.col("price_usd") > 0))
    .write_parquet(dst)
)
print(f"wrote {dst.resolve()}")
PY

export STACKSATS_ANALYTICS_PARQUET=$(pwd)/bitcoin_analytics.parquet

Canonical Source of Truth¶

Google Drive parquet: https://drive.google.com/file/d/1jKRRU7l9kOMdGI_hIJGg02X3jWTMPJsw/view?usp=sharing
BRK project: https://github.com/bitcoinresearchkit/brk
BRK crate: https://crates.io/crates/brk
BRK rustdoc: https://docs.rs/crate/brk/latest
Packaged manifest used by stacksats data fetch: stacksats/assets/brk_data_manifest.json
Repo mirror for docs and legacy script usage: data/brk_data_manifest.json

The manifest defines, for both parquet and schema artifacts:

name
file_id
sha256
size_bytes
version

It also tracks:

gdrive_folder_url
updated_at_utc

Fetch + Verify Workflow¶

Recommended commands:

stacksats data fetch
stacksats data prepare
stacksats data doctor

Default behavior:

downloads canonical source parquet to ~/.stacksats/data/brk/
writes the packaged schema markdown beside it
stacksats data prepare writes runtime bitcoin_analytics.parquet at ~/.stacksats/data/bitcoin_analytics.parquet
verifies sha256 and exact file size from manifest
fails closed on missing metadata, hash mismatch, size mismatch, or partial download

Legacy script wrapper remains available:

venv/bin/python scripts/fetch_brk_data.py --target-dir ~/.stacksats/data/brk

Refreshing Data Metadata (Maintainers)¶

When Drive artifacts are refreshed:

update file_id, sha256, size_bytes, version, updated_at_utc in stacksats/assets/brk_data_manifest.json
mirror the same manifest payload to data/brk_data_manifest.json
run venv/bin/python scripts/fetch_brk_data.py --target-dir . --overwrite
update Merged Metrics Parquet Schema when canonical schema/profile changes
verify docs/tests pass

Do not add network fetches to runtime providers. Keep downloads script-only to preserve deterministic runtime behavior.