CERES

Dashboard→Regions→Map→Sub-national→Methodology→API→About→Track Record→Validation→Impact→Data→Changelog→Sign In→

Accuracy Metrics

0 predictions \u00B7 first grades expected Aug\u2013Oct 2026

Brier Score

Pending

Target < 0.10 ⏳ Expected Aug\u2013Oct 2026

SI Coverage (90%)

Pending

Target > 88% ⏳ Expected Aug\u2013Oct 2026

Brier Skill Score

Pending

Target > 0 ⏳ Expected Aug\u2013Oct 2026

Total Predictions

Target ≥ 43/week ✗ Missed

Note: 0 predictions issued; the first T+90 grading windows opened June 2026. Observed IPC/FEWS NET classifications publish on a 2–4 month lag, so the earliest grades land Aug–Oct 2026. Grading is automated and pre-registered: the Brier decomposition computes the moment ≥10 outcomes are published, with no manual intervention.

Pending Verification

Predictions Awaiting Grading Window

Loading…

Loading verification ledger\u2026

Public Prediction Ledger

Graded Predictions: Forward Validation

Loading…

FIRST GRADES EXPECTED AUG–OCT 2026

The first T+90 grading windows opened June 2026. Grades are written automatically once IPC/FEWS NET publish the observed Current-Situation classification for each region, which lags the target window by 2–4 months. This ledger updates every Monday and nothing is graded against projected data.

Calibration · Awaiting Prospective Data

87 IPC Records · 31 Countries · 4 Back-validation Cases

Model initialised against 87 IPC transition records (2011–2023, 31 countries). 4 data-complete back-validation cases.

Reliability Diagram \u00b7 Predicted vs. Observed Probability

Perfect calibration lies on the diagonal. Points above = underconfident; below = overconfident.

Well-calibrated (±10%)

Outside tolerance

Perfect calibration

Calibration by Predicted Probability Bin

Grey = ideal calibration · Amber = CERES observed rate · (n) = predictions in bin

Validation Dataset Breakdown

IPC transition records	87 country-seasons
Countries represented	31
Time period	2011–2023
Phase 4–5 events	18
Back-validation cases	4 (data-complete only)
Perturbation draws	n=2,000 per prediction
Interval type	Input-perturbation 90%

Pre-Registered Calibration Protocol

What We Commit to Measuring

Table 1 from the CERES preprint. These metrics were pre-registered before any prospective outcome data was collected. No metrics will be selectively reported: all graded predictions remain permanently visible. Minimum sample sizes are fixed; targets cannot be revised retroactively.

Metric	Definition	Min. N	Target date	Status
Brier Score	Mean (P̂₃ − O₃)²	100 predictions	Jun 2026	⏳ Pending
Brier Skill Score	1 − BS / BS_climatology	100 predictions	Jun 2026	⏳ Pending
TIER-1 Precision	True TIER-1 / all TIER-1 issued	30 TIER-1 alerts	Sep 2026	⏳ Pending
TIER-1 Recall	True TIER-1 / all Phase 4+ events	10 Phase 4+ events	Sep 2026	⏳ Pending
Sensitivity interval coverage	Fraction outcomes in 90% interval	200 predictions	Sep 2026	⏳ Pending
CRPS (ordered categorical)	Full distribution vs IPC phase	500 predictions	Mar 2027	⏳ Pending
Reliability diagram	Forecast prob. vs empirical frequency	500 predictions	Mar 2027	⏳ Pending

Pre-registered in Pedersen (2026), Table 1. Protocol locked prior to accumulation of prospective outcome data.

The CERES Transparency Commitment

Every prediction CERES issues is permanently recorded in this ledger with a timestamp, probability estimate, 90% sensitivity interval, and T+90 day grading date. We do not remove predictions that prove incorrect. We analyse and publish the reasons for forecast errors. The accuracy record here is the complete record: there is no curated subset. This is the foundation of institutional trust.