Every forecast of record is scored against what actually happened — automatically, nightly, no take-backs. Day one: zero resolved. Scoring begins as floor votes land; never-voted bills resolve when the Congress ends.
| Target | Chamber | Resolved | Brier | |
|---|---|---|---|---|
| reach_floor | house | 0 | — | |
| reach_floor | senate | 0 | — | |
| pass_chamber | house | 0 | — | |
| pass_chamber | senate | 0 | — | ← the row to watch |
| enacted | house | 0 | — | |
| enacted | senate | 0 | — | |
| whip_count | house | 0 | — | |
| whip_count | senate | 0 | — |
Staged forecasts. P(becomes law) = P(reaches a floor vote) × P(passes if voted) × P(survives the other chamber and the President), each stage a separately calibrated model trained on Congresses 110–118 (~935k bill-snapshots, 2.4M member-votes). Whip counts come from a member-level Monte-Carlo with empirically calibrated correlated shocks; House procedure is never assumed — suspension and rule branches are simulated and pooled by their historical likelihood.
Scoring. The forecast of record is the last one published before the resolving event's date. Recorded votes resolve pass/whip targets; voice and unanimous-consent passage resolves pass targets; bills never voted resolve to 0 at sine die. Binary targets score Brier (lower is better; 0.25 = coin flip); whip counts score interval coverage. Calibration (ECE) is reported per target and chamber.
Before anything shipped, every model passed a rolling-origin backtest — train through Congress T−1, test on T, for five held-out Congresses (2015–2024, two majority flips): member-vote Brier skill +0.66 to +0.76 vs the base rate with pooled Senate ECE 0.028; whip 90%-intervals covered 87%–97% on frozen parameters; the composite enactment probability scored skill +0.09 to +0.13 with ECE ≤ 0.004 on every fold.