Technical Documentation

How CERES Works

A complete description of the pipeline, modelling approach, calibration methodology, and tier classification system used to generate 90-day probabilistic famine forecasts.

§ 1 — Overview

The 90-Day Lead Time Problem

Existing humanitarian early warning systems — including FEWS NET and IPC cadres — provide effective lead times of 30–45 days before a food crisis reaches emergency thresholds. Pre-positioning food aid, mobilising logistics, and securing emergency funding through multilateral mechanisms requires a minimum of 60–90 days.

CERES is designed to close this gap. It produces falsifiable, probabilistic 90-day forecasts of acute food insecurity, expressed as P(IPC Phase 3+) — the probability that a monitored region will reach crisis-level hunger within 90 days — with explicit calibrated confidence intervals.

Operational Scope

CERES predictions are research outputs intended to augment — not replace — field-based IPC assessments and expert humanitarian judgement. All forecasts carry explicit uncertainty quantification and should be interpreted alongside ground-truth verification.

The system ingests eight open data streams covering rainfall, vegetation, conflict, food access, market prices, and displacement. These are synthesised by the Hierarchical Grounding Engine (HGE) into ranked driver hypotheses, which feed a calibrated logistic model producing probabilistic risk scores at Admin1 resolution across 121 administrative units in 15 countries.

§ 2 — Pipeline Architecture

Six-Stage Processing Pipeline

Each pipeline run proceeds through six sequential stages. The run identifier (e.g. CERES-20260228-160603) is recorded with every prediction, enabling complete reproducibility and audit.

1
Signal Ingestion

Eight data adapters ingest satellite, conflict, market, and displacement streams. Each adapter normalises its source to a shared 0.25° spatial grid (~28km cells) and ISO-week temporal cadence. Raw data is cached with source provenance and retrieval timestamp.

2
Stress Scoring

Per-Admin1 composite stress scores are computed as a weighted sum across six sub-scores: drought stress, vegetation anomaly, conflict intensity, food access, IPC phase, and market price deviation. Weights are learned from the retrospective validation set (2022–2025).

3
HGE — Hierarchical Grounding Engine

Elevated signals are clustered into ranked driver hypotheses. Each hypothesis identifies a primary causal mechanism (e.g. conflict-driven market failure), supporting evidence records, and a confidence weight. Up to three hypotheses are generated per region per run.

4
Probabilistic Forecast

A logistic regression model converts composite stress scores into P(IPC Phase 3+) at a 90-day horizon. Bootstrap resampling (n=2,000) generates calibrated 90% confidence intervals. Both the point estimate and the full CI are reported for every prediction.

5
Tier Classification

Predictions are assigned to one of three alert tiers based on probability thresholds. Tier assignment triggers downstream alerting and determines the urgency framing in intelligence reports.

6
Grading & Calibration

At T+90 days, each prediction is graded against the published IPC outcome for that region-month. Brier scores, CI coverage, and precision/recall metrics are updated continuously. Calibration failures trigger model review.

§ 3 — Hypothesis Generation Engine

The HGE: From Signals to Hypotheses

The Hierarchical Grounding Engine (HGE) is the core intelligence layer that distinguishes CERES from threshold-based early warning systems. Rather than flagging when a single indicator crosses a threshold, HGE synthesises multi-source signal convergence into causal hypotheses — ranked, evidenced explanations of why risk is elevated.

Signal Convergence Detection

HGE monitors for simultaneous stress elevation across independent data streams. When two or more signals from different domains (e.g. CHIRPS rainfall deficit + ACLED conflict escalation + WFP VAM food access deterioration) converge on the same Admin1 region in the same time window, this constitutes a convergence event — a materially stronger signal than any single indicator in isolation.

Hypothesis Taxonomy

Each convergence event is classified into one of four primary causal archetypes:

ArchetypePrimary SignalsTypical Regions
Conflict-drivenACLED, FEWS NET, UNHCRSudan, Somalia, Yemen, South Sudan
Climate-drivenCHIRPS, MODIS NDVI, FAO GIEWSSahel, Horn (off-conflict seasons)
Economic/marketWFP VAM, FEWS NET pricesUrban centres, import-dependent regions
Multi-causalAll streamsActive conflict zones with drought overlay

Evidence Records

Every hypothesis is grounded in structured evidence records — individual signal observations that either support or contradict the hypothesis. Each record specifies: source, variable name, observed value, baseline threshold, deviation direction, and a binary support/contradict verdict.

Design Principle

HGE never produces a prediction without an auditable evidence chain. Every probability estimate has a traceable hypothesis. Every hypothesis has traceable evidence records. This is a deliberate design constraint — it is what makes CERES predictions defensible to institutional reviewers.

§ 4 — Probabilistic Model

Logistic Model & Confidence Intervals

CERES uses a calibrated logistic regression model to convert composite stress scores into IPC Phase 3+ exceedance probabilities at a 90-day horizon. The choice of logistic regression is deliberate: it is well-understood, natively probabilistic, and produces outputs that are straightforwardly interpretable by non-technical reviewers.

Core Model
P(IPC 3+ | X, t+90) = σ(β₀ + β₁·CSS + β₂·conflict + β₃·NDVI_anomaly + β₄·rainfall_SPI + β₅·IPC_current)

where σ is the logistic function, CSS is the composite stress score,
and coefficients β are estimated on the 2022–2025 retrospective validation set.

Bootstrap Confidence Intervals

Point estimates alone are insufficient for humanitarian decision-making. CERES generates 90% confidence intervals via non-parametric bootstrap resampling with n=2,000 replications. This captures both model parameter uncertainty and data variability, producing intervals that reflect genuine epistemic uncertainty.

CI Construction
CI₉₀ = [P̂₅, P̂₉₅] where P̂ₖ is the k-th percentile of the bootstrap distribution
n_bootstrap = 2,000 · Empirical coverage = 91.2% (target: ≥88%)
§ 5 — Tier Classification

Alert Tier Definitions

Predictions are assigned to one of three alert tiers based on the point estimate of P(IPC Phase 3+). Tier thresholds are calibrated to IPC phase transition probabilities estimated from the validation dataset.

Tier I · Critical
> 90%

IPC Phase 4–5 (Emergency or Famine) probable within 90 days. Immediate humanitarian pre-positioning recommended.

Tier II · Warning
70–90%

IPC Phase 3 (Crisis) likely within 90 days. Enhanced monitoring and contingency planning indicated.

Tier III · Watch
50–70%

Elevated risk of IPC Phase 3 deterioration. Situational monitoring and early preparedness recommended.

Important

Tier I classification does not constitute a famine declaration. Only the IPC Global Platform, through its established cadre process and field verification, has the mandate to declare famine (IPC Phase 5). CERES Tier I indicates a probability of reaching Phase 3 or above.

§ 6 — Validation & Calibration

Model Performance

CERES is validated against 847 region-months of published IPC outcomes spanning six countries and three famine-grade events between 2022 and 2025. Four performance targets are set and continuously tracked.

Brier Score
0.087

Target <0.10 ✓ — Lower is better. Equivalent to well-calibrated probabilistic weather forecasting.

CI Coverage (90%)
91.2%

Target >88% ✓ — Empirical proportion of true outcomes within stated 90% CI.

Tier-I Precision
0.84

Target >0.80 ✓ — Of Tier-I alerts issued, 84% correctly preceded IPC Phase 3+ outcomes.

Tier-I Recall
0.91

Target >0.85 ✓ — Of IPC Phase 3+ events that occurred, 91% were preceded by a Tier-I alert.

§ 7 — Limitations

Known Limitations & Constraints

Data Latency

Several input streams (IPC, FAO GIEWS) are updated bi-annually or monthly. Between updates, predictions rely on interpolated or lagged data, which may not capture rapidly deteriorating situations driven by sudden shocks (conflict escalation, flash flooding).

Admin1 Resolution

Predictions are generated at Admin1 (provincial) level. Intra-provincial heterogeneity — particularly in large regions like Oromia (Ethiopia) or Jonglei (South Sudan) — may be significant. Admin1 classifications mask sub-national variation that field assessments would capture.

Model Transferability

The logistic model is trained on six countries in the Horn of Africa and Arabian Peninsula. Performance in geographically or structurally distinct contexts (South Asia, Central America) has not been validated and should not be assumed.

Conflict Dynamics

ACLED conflict data captures reported events with variable reporting lag. In active conflict zones, the most acute areas may be the least reported. CERES may systematically under-estimate risk in media-dark conflict environments.

Transparency Commitment

This limitations section is intentionally complete. CERES is an open system. Reviewers, funders, and operational partners are encouraged to scrutinise these constraints and communicate additional concerns to the Northflow research team.

§ 8 — Citation

How to Cite CERES

If you reference CERES predictions or methodology in published work, please use the following citation format. An arXiv pre-print describing the full methodology and validation dataset is forthcoming.

Preferred Citation
Northflow Technologies (2026). CERES: Calibrated Early-warning & Risk Evaluation System — A Probabilistic Famine Forecasting System. Technical Report. Northflow Technologies. https://ceres.northflow.no/methodology