Data Contract

The backtest harness needs enough information to replay the same model surface consistently across folds and across repos.

Minimum Required Inputs

For a runnable pilot manifest, the repo needs:

  • a weekly source table
  • a date column
  • a KPI / response column
  • the locked model formula
  • priors
  • boundaries
  • repo targets and repo paths
  • fit settings and seed policy

If recommendation stability is in scope, the repo also needs:

  • media spend history or equivalent channel-spend history
  • a declared recommendation contract
  • a channel map from spend inputs to model terms / allocation variables

Why The Source Table Matters

The formulas in scope include lagged and rolling terms. That means fold inputs must be rebuilt from source data at each cutoff rather than sliced from a full-sample engineered matrix.

Tracked Data Packages

The repo keeps smaller GitHub-friendly replication packages under ../data/:

  • _st active engineering pilot
  • _ov reserve candidate
  • _os retained stress fixture

Large reviewed bundles under data_review/ are kept local only.

Current Active Pilot

The active engineering pilot is _st.

The active manifest currently lives in the local planning layer at .planning/research/pilot_manifest.yaml.

Note that .planning/ is local-only in the current repo setup, so colleagues using GitHub alone should rely on the tracked replication data, README, docs, and report rather than the local planning spine.