535.fyi Calibrated odds on what Congress actually does. backtest-calibrated · live scoring as votes land · track record

Track record

Every forecast of record is scored against what actually happened — automatically, nightly, no take-backs. Day one: zero resolved. Scoring begins as floor votes land; never-voted bills resolve when the Congress ends.

TargetChamberResolved Brier
reach_floorhouse0
reach_floorsenate0
pass_chamberhouse0
pass_chambersenate0 ← the row to watch
enactedhouse0
enactedsenate0
whip_counthouse0
whip_countsenate0

Senate conditional forecasts are flagged moderate confidence: in backtests their per-Congress calibration drifts (ECE 0.06–0.09) around a well-calibrated long-run average (0.028). The table above is where we — and you — watch whether live Senate behaves.

Methodology

Staged forecasts. P(becomes law) = P(reaches a floor vote) × P(passes if voted) × P(survives the other chamber and the President), each stage a separately calibrated model trained on Congresses 110–118 (~935k bill-snapshots, 2.4M member-votes). Whip counts come from a member-level Monte-Carlo with empirically calibrated correlated shocks; House procedure is never assumed — suspension and rule branches are simulated and pooled by their historical likelihood.

Scoring. The forecast of record is the last one published before the resolving event's date. Recorded votes resolve pass/whip targets; voice and unanimous-consent passage resolves pass targets; bills never voted resolve to 0 at sine die. Binary targets score Brier (lower is better; 0.25 = coin flip); whip counts score interval coverage. Calibration (ECE) is reported per target and chamber.

Before anything shipped, every model passed a rolling-origin backtest — train through Congress T−1, test on T, for five held-out Congresses (2015–2024, two majority flips): member-vote Brier skill +0.66 to +0.76 vs the base rate with pooled Senate ECE 0.028; whip 90%-intervals covered 87%–97% on frozen parameters; the composite enactment probability scored skill +0.09 to +0.13 with ECE ≤ 0.004 on every fold.

Model versions: member-vote-1.0.0-t117d118-20260606 + passage-1.0.0-t117d118-20260606. Forecast run predict-20260606T193507Z.