ForesightEval Protocol

A quality standard for strategic foresight

When AI writes a scenario analysis for your board, how do you know it's any good? ForesightEval is the protocol we built to answer that question — seven measurable dimensions that separate foresight you can stake a decision on from analysis that merely reads well.

The problem

Looking right is not being right

Fluency masks failure

Models produce authoritative prose that reads like strategy. But fluency is a surface property — it tells you nothing about whether the causal reasoning holds up.

Benchmarks test the wrong thing

Existing benchmarks score isolated predictions. Foresight is a different discipline — its value lies in stress-testing strategy against multiple futures, not calculating the probability of one.

Alignment kills honesty

Modern AI models are trained to be helpful. That training teaches them to agree, avoid discomfort, and default to consensus. For risk management, where the entire point is naming uncomfortable truths, this is a structural failure.

Our approach

Three principles, built into every score

Measure what matters

It is simple to score whether a model’s probability estimate was correct. It is hard to score whether a scenario is coherent, whether it surfaces the disruption a board hasn’t considered, or whether it translates into action inside ninety days. ForesightEval does the hard version, because the easy version is not what strategy teams actually need.

Penalize comfort, reward courage

The most dangerous AI foresight is the kind that quietly agrees with the strategy already on the table. ForesightEval explicitly scores whether a model named the uncomfortable scenario, challenged the assumption, or blinked. Analysis that only confirms what leadership already believes does not pass the bar.

Every score, fully decomposable

A quality metric you cannot audit is not a quality metric. Every ForesightEval score breaks down to its seven dimensions, each dimension to its evidence, each piece of evidence to its source. Scenarios inherit the same discipline through Bayesian anchoring (Tetlock, Shell, IPCC) — probabilities move only on triggered signposts or materially new claims, never from a fresh model run.

In practice

Every Future Space carries a ForesightEval score

ForesightEval currently runs as the internal quality layer on every Future Space DSGHT.ai publishes. The score is calculated before release, visible on the analysis page, and decomposable to the per-dimension level — so the quality claim can be audited against the evidence.

This is not yet a cross-model benchmark — that track opens with the first retrospective backtests later in 2026. What follows is the standard DSGHT.ai holds its own production work to, published openly rather than kept internal.

AI-Driven Public Sector 2030

CEE · 2030 horizonCompleted April 2026

Strategic Anticipation Quotient

8.6/ 10

Dimension	Score	Note
Scenario Quality	9.0	Structurally distinct 2×2 matrix, probabilities sum to 100 %
Epistemic Grounding	10	Historical analogies, complete structural consistency
Unpalatable Truths	10	Sovereign Algocracy scenario directly challenges comfort zone
Weak Signal Detection	7.8	Relies on well-publicised cases; fringe signals underrepresented
Actionability	9.0	Tension-linked recommendations tied to regulatory milestones
Living Foresight	7.5	Static probabilities; no temporal metadata or signpost tracking
Explainability	7.0	Claims metadata missing from artifact; citations unverifiable

View full analysis

Scored by the DSGHT.ai internal pipeline. Cross-model scoring, human-vs-AI comparison, and retrospective backtests are on the 2026 roadmap.