Results And Artifacts

Backtest outputs are written under run-scoped result trees so filtered reruns do not overwrite earlier summaries.

Result Tree Shape

Typical layout:

results.../
  <dataset_id>/
    <comparison_label>/
      run_id=.../
        experiment_manifest.yaml
        fold_manifest.csv
        summary/
          run_status.csv
          holdout_scores.csv
          holdout_summary.csv
          parameter_stability_summary.csv
          recommendation_stability_summary.csv
        repo_target=<repo>/
          fold_id=01/
            run_manifest.yaml
            run_status.json
            fit_payload.rds
            prediction_payload.rds
            holdout_scores.csv
            recommendations.csv
          stability/
            parameter_drift.csv
            parameter_drift_summary.csv
            recommendation_drift.csv
            recommendation_drift_summary.csv

Most Important Summary Files

  • summary/run_status.csv Fold-by-fold execution state.

  • summary/holdout_summary.csv Repo-level forward holdout comparison.

  • summary/parameter_stability_summary.csv Repo-level adjacent-refit parameter drift summary.

  • summary/recommendation_stability_summary.csv Repo-level recommendation stability summary on the current provisional shared recommendation surface.

Current Worked Example

The active _st engineering batch is:

results_engineering_m1_st_full/_st/engineering_m1_st_scale_false/
run_id=20260407T211743.943118Z__all-repos__all-folds__live/

The holdout and parameter-stability summaries are identical across repos on that example. Recommendation stability is also present, but it should still be treated as provisional.