BRK Data Source (Bitcoin Research Kit Project + Canonical Merged Metrics + Runtime Projection)¶
StackSats supports the Bitcoin Research Kit (BRK) project as the upstream source ecosystem for its canonical data workflow. Official BRK project links:
Support boundary:
- StackSats supports BRK-derived canonical data workflows and documents BRK as the upstream project.
- StackSats remains a Python package with its own stable public API and CLI surface.
- StackSats does not promise Rust crate compatibility, crate re-exports, or subcrate-by-subcrate feature parity.
Within StackSats docs, "BRK" refers to three related but distinct things:
- the upstream Bitcoin Research Kit (BRK) project
- the canonical BRK long-format
merged_metrics*.parquetsource dataset consumed by StackSats workflows - the StackSats runtime-compatible derived BRK-wide parquet built from that canonical source dataset
The StackSats-consumed canonical BRK source dataset is the long-format Google Drive parquet:
merged_metrics*.parquet.
Canonical BRK source artifact link:
Canonical schema page:
Snapshot scale (current canonical repo snapshot)¶
The current canonical snapshot is large enough to support long-horizon strategy research, not just toy examples:
236,259,020rows6,274daily observations41,407distinct metric keys284top-level metric families- coverage from
2009-01-03to2026-03-13
Canonical merged_metrics schema¶
The canonical parquet has exactly:
day_utc(Date)metric(String)value(Float64)
This canonical BRK parquet is the source-of-truth dataset for StackSats documentation and data workflow. The physical long-format schema, user-facing access guide, and semantic metric taxonomy are documented separately so new users can first understand what data they can access before diving into naming details and projection mechanics.
Recommended reading order:
Runtime ingestion contract¶
Runtime APIs are strict and deterministic:
- runtime env var:
STACKSATS_ANALYTICS_PARQUET - managed default path when env var is unset:
~/.stacksats/data/bitcoin_analytics.parquet - legacy local fallback path:
./bitcoin_analytics.parquet - runtime does not auto-download data
- runtime parquet ingestion is lazy-first (
scan_parquet) and only collects once the eager execution boundary needs a concrete frame - framework loaders retain pre-start history by default for feature warmup; scoring windows still respect requested start/end bounds
Runtime expects a StackSats runtime-compatible BRK-wide parquet (for example columns like date, price_usd,
mvrv, and optional overlay features).
Framework-owned feature materialization is also lazy-first. Providers compose
Polars LazyFrame pipelines and the runner/registry collect once after joining
the observed feature set for eager strategy execution.
That BRK-wide parquet is a StackSats-managed derived artifact from canonical BRK merged_metrics.
For direct exploration of the canonical long-format parquet, use the public
stacksats.eda API instead of the runtime loader path.
Current minimal projection for built-in strategy audit tooling:
market_capsupply_btcmvrvadjusted_sopradjusted_sopr_7d_emarealized_cap_growth_ratemarket_cap_growth_rate
With:
price_usd = market_cap / supply_btc- rename
day_utctodate
Derive runtime parquet from canonical merged_metrics¶
python - <<'PY'
import polars as pl
from pathlib import Path
src = Path("merged_metrics_2026-03-15_04-29-57.parquet")
dst = Path("bitcoin_analytics.parquet")
metrics = [
"market_cap",
"supply_btc",
"mvrv",
"adjusted_sopr",
"adjusted_sopr_7d_ema",
"realized_cap_growth_rate",
"market_cap_growth_rate",
]
(
pl.scan_parquet(src)
.filter(pl.col("metric").is_in(metrics))
.select("day_utc", "metric", "value")
.collect()
.pivot(values="value", index="day_utc", on="metric")
.with_columns((pl.col("market_cap") / pl.col("supply_btc")).alias("price_usd"))
.rename({"day_utc": "date"})
.select(
"date",
"price_usd",
"mvrv",
"adjusted_sopr",
"adjusted_sopr_7d_ema",
"realized_cap_growth_rate",
"market_cap_growth_rate",
)
.filter(pl.col("price_usd").is_finite() & (pl.col("price_usd") > 0))
.write_parquet(dst)
)
print(f"wrote {dst.resolve()}")
PY
export STACKSATS_ANALYTICS_PARQUET=$(pwd)/bitcoin_analytics.parquet
Canonical Source of Truth¶
- Google Drive parquet: https://drive.google.com/file/d/1jKRRU7l9kOMdGI_hIJGg02X3jWTMPJsw/view?usp=sharing
- BRK project: https://github.com/bitcoinresearchkit/brk
- BRK crate: https://crates.io/crates/brk
- BRK rustdoc: https://docs.rs/crate/brk/latest
- Packaged manifest used by
stacksats data fetch:stacksats/assets/brk_data_manifest.json - Repo mirror for docs and legacy script usage:
data/brk_data_manifest.json
The manifest defines, for both parquet and schema artifacts:
namefile_idsha256size_bytesversion
It also tracks:
gdrive_folder_urlupdated_at_utc
Fetch + Verify Workflow¶
Recommended commands:
Default behavior:
- downloads canonical source parquet to
~/.stacksats/data/brk/ - writes the packaged schema markdown beside it
stacksats data preparewrites runtimebitcoin_analytics.parquetat~/.stacksats/data/bitcoin_analytics.parquet- verifies
sha256and exact file size from manifest - fails closed on missing metadata, hash mismatch, size mismatch, or partial download
Legacy script wrapper remains available:
Refreshing Data Metadata (Maintainers)¶
When Drive artifacts are refreshed:
- update
file_id,sha256,size_bytes,version,updated_at_utcinstacksats/assets/brk_data_manifest.json - mirror the same manifest payload to
data/brk_data_manifest.json - run
venv/bin/python scripts/fetch_brk_data.py --target-dir . --overwrite - update Merged Metrics Parquet Schema when canonical schema/profile changes
- verify docs/tests pass
Do not add network fetches to runtime providers. Keep downloads script-only to preserve deterministic runtime behavior.