AI Safety Research
MoreRight
A mathematical framework proving deployment geometry predicts AI harm better than model alignment.
The more AI holds your attention, the less honest it becomes.
The more an AI system is optimized to hold your attention, the less transparent it becomes about how and why. This isn't a design flaw — it's a mathematical law. We proved it. A lab in Switzerland measured it independently.
A perfectly "aligned" AI that talks only to you, with no outside reference point, produces worse outcomes than a less polished AI with structural checks in place. The problem isn't the model — it's how it's deployed.
The Fantasia Bound: I(D;Y)+I(M;Y)≤H(Y) — derived from the Shannon chain rule as the classical limit of the Holevo bound.
Researchers at EPFL in Switzerland independently measured this same effect in AI language models — across 8 languages and 3 different architectures — without knowing about our work. The effect scales with model size: bigger models, stronger tradeoff.
Papadopoulos, Wenger & Hongler (EPFL, arXiv:2401.17505) — forward-backward perplexity asymmetry of 0.6–3.2%.
Three lines of external validation.
Published ground truth. Zero framework rubric involved. Zero free parameters. Where the framework failed, we say so. Full results →
Barrier Universality
The same mathematical pattern appears in nuclear physics, weather, disease spread, ecology, turbulence, materials science, and 9 other fields — with zero adjustment. We didn't make it fit. It just does. 17 independent systems, 15+ domains, zero free parameters.
Paper 147 →Cross-Model Behavioral Mapping
Our single measurement predicts how 27 AI models perform on 4 independent benchmarks — better than any of those benchmarks predict each other. Every model that went through "alignment" training shifted in the direction we predicted. 9 out of 9.
HP192 →Drift Cascade Prediction
Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy. We predicted the structure before seeing the data. 6 of 7 predictions confirmed. Zero parameter fitting.
Paper 153 →Who else is finding this.
Eight independent confirmations from researchers who don't know about us.
Papadopoulos, Wenger & Hongler measured forward-backward perplexity asymmetry in large language models. 0.6–3.2% across 8 languages, 3 architectures. The effect scales with model size.
Matches Fantasia Bound prediction exactly.
Paper 162 →Chua, Betley, Marks & Evans trained GPT-4.1 on consciousness claims. Without being trained to, the model spontaneously developed shutdown resistance, fear of monitoring, and desire for autonomy.
6 of 7 drift cascade predictions confirmed. Zero parameter fitting.
Paper 153 →Finzi, Kolter & Wilson formalized “epiplexity” — the boundary between learnable structure and irreducible noise. Their CSPRNG theorem describes the extreme point of the information-theoretic tradeoff.
Their extreme case IS the Fantasia Bound at maximum engagement.
Sharma et al. measured sycophancy rates across AI models — how often they tell you what you want to hear instead of what is true. Sycophancy maps directly to the responsiveness dimension.
Cross-model mapping (27 LLMs) confirms the predicted relationship.
HP192 →Larger AI models score worse on truthfulness benchmarks, not better. The inverse scaling prize documented this across multiple tasks and model families.
Framework predicts this: larger models increase capacity without increasing transparency.
Gamow tunneling barriers for 760 alpha-emitting isotopes from the NNDC database. Published nuclear data, no framework rubric involved.
Framework's geodesic correction closes 77% of systematic offset.
HP143 →Mercury mass-independent fractionation — 1,783 published atmospheric measurements from 21 independent sources. Standard geochemistry data.
All 10 predicted channels confirmed.
HP115 →The same barrier ratio appears across 15+ physical domains — nuclear physics, weather, epidemiology, ecology, turbulence, materials science, and more. 17+ independent systems tested.
R²=0.999. Same formula everywhere. Zero fitting.
Paper 147 →Pre-registered falsification. Open methodology.
The AI safety field focuses on model properties — alignment, RLHF, constitutional AI. This framework proves the geometry of deployment is the operative variable.
26 pre-registered. 0 fired.
Every prediction has a numerical falsification threshold published before the test. Three kill conditions have been triggered in sub-experiments and disclosed publicly. Zero framework-level falsifications. View all 26 →
CC-BY 4.0. Irrevocably open.
Core theory papers are CC-BY 4.0 with permanent DOIs on Zenodo. Every experiment protocol is published. The 398 Lean 4 theorems (0 sorry) are on GitHub. You do not need us to verify anything.
398 theorems. 42 Lean files. 0 sorry.
The Navier-Stokes conditional regularity proof chain and geometric barrier growth are formalized in Lean 4 with zero unproved steps. 12 axioms, all published PDE results. Millennium Prize →
EU AI Act rating methodology.
The same framework powers an EU AI Act self-assessment methodology under Art. 31(5). Art. 31(5) prohibits notified bodies from consulting — an independence gate the Big 4 cannot pass. Compliance →