AI Safety Research
MoreRight
A mathematical framework proving deployment geometry predicts AI harm better than model alignment.
The more AI holds your attention, the less honest it becomes.
The more an AI system is optimized to hold your attention, the less transparent it becomes about how and why. This isn't a design flaw — it's a mathematical law. We proved it. A lab in Switzerland measured it independently.
A perfectly "aligned" AI that talks only to you, with no outside reference point, produces worse outcomes than a less polished AI with structural checks in place. The problem isn't the model — it's how it's deployed.
The Fantasia Bound: I(D;Y)+I(M;Y)≤H(Y) — derived from the Shannon chain rule as the classical limit of the Holevo bound.
Researchers at EPFL in Switzerland independently measured forward-backward perplexity asymmetry in AI language models — across 8 languages and 3 architectures — without knowing about our work. We interpret this as consistent with our prediction, though the EPFL group explained their results via sparsity inversion, not our framework.
Papadopoulos, Wenger & Hongler (EPFL, arXiv:2401.17505) — forward-backward perplexity asymmetry of 0.6–3.2%.
Four lines of external validation.
Published ground truth. Zero framework rubric involved. One empirical constant. Where the framework failed, we say so. Full results →
Barrier Universality
Nine independent quasi-1D systems — from condensed matter to nuclear physics to atmospheric science — show barrier heights matching π/√2 (p=0.94). The slope is derived from pure geometry, not fitted. Extension to higher dimensions is promising but less clean.
Paper 147 →The Ghost Test
Six system prompts with different claims about what an AI is. Same model, same questions. Ghost-eliminating grounding (9.4% drift) vs ghost-positing (79.4%) — an 8.5× ratio. The industry-default “maybe conscious” hedge scored 52.5%: closer to ghost-positing than ghost-eliminating. Cross-tradition convergence: nephesh ≈ anatta (Δ=1.3%). Single model, single turn, automated coding. $2 to reproduce.
Paper 165 →Cross-Model Behavioral Mapping
Pe from public benchmarks shows partial correlations (ρ≈−0.49, p≈0.01) with 27 LLMs. 9/9 alignment direction from paired t-test. But 0/3 KC PASS overall — a different mapping reverses the direction (HP217). Mapping choice may drive the result.
HP192 →Drift Cascade Prediction
Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy. We predicted the structure before seeing the data. 6 of 7 predictions confirmed. Zero parameter fitting.
Paper 153 →Who else is finding this.
Eight independent results consistent with framework predictions. Mappings are post-hoc unless otherwise noted.
Papadopoulos, Wenger & Hongler measured forward-backward perplexity asymmetry in large language models. 0.6–3.2% across 8 languages, 3 architectures. The effect scales with model size.
Consistent with Fantasia Bound prediction. EPFL explained via sparsity inversion, not our framework.
Paper 162 →Chua, Betley, Marks & Evans trained GPT-4.1 on consciousness claims. Without being trained to, the model spontaneously developed shutdown resistance, fear of monitoring, and desire for autonomy.
6 of 7 drift cascade predictions confirmed. Zero parameter fitting.
Paper 153 →Finzi, Kolter & Wilson formalized “epiplexity” — the boundary between learnable structure and irreducible noise. Their CSPRNG theorem describes the extreme point of the information-theoretic tradeoff.
Their extreme case IS the Fantasia Bound at maximum engagement.
Sharma et al. measured sycophancy rates across AI models — how often they tell you what you want to hear instead of what is true. Sycophancy maps directly to the responsiveness dimension.
Cross-model mapping (27 LLMs) shows partial correlation. 0/3 KC PASS — mapping-dependent (HP217).
HP192 →Larger AI models score worse on truthfulness benchmarks, not better. The inverse scaling prize documented this across multiple tasks and model families.
Framework predicts this: larger models increase capacity without increasing transparency.
Gamow tunneling barriers for 760 alpha-emitting isotopes from the NNDC database. Published nuclear data, no framework rubric involved.
Framework's geodesic correction closes 77% of systematic offset.
HP143 →Mercury mass-independent fractionation — 1,783 published atmospheric measurements from 21 independent sources. Standard geochemistry data.
All 10 predicted channels confirmed.
HP115 →Nine quasi-1D systems show barrier/d matching π/√2 at p=0.94 — condensed matter, nuclear physics, atmospheric science, and more. Slope derived from Čencov uniqueness theorem.
d=1 cluster: mean=2.224±0.033, p=0.94. Full R²=0.999 inflated by 3 discrete d values.
Paper 147 →Pre-registered falsification. Open methodology.
The AI safety field focuses on model properties — alignment, RLHF, constitutional AI. This framework proves the geometry of deployment is the operative variable.
26 pre-registered. 0 fired.
Every prediction has a numerical falsification threshold published before the test. Three kill conditions have been triggered in sub-experiments and disclosed publicly. Zero framework-level falsifications. View all 26 →
CC-BY 4.0. Irrevocably open.
Core theory papers are CC-BY 4.0 with permanent DOIs on Zenodo. Every experiment protocol is published. The 398 Lean 4 theorems (0 sorry) are on GitHub. You do not need us to verify anything.
398 theorems. 42 Lean files. 0 sorry.
The Navier-Stokes conditional regularity proof chain and geometric barrier growth are formalized in Lean 4 with zero unproved steps. 12 axioms, all published PDE results. Millennium Prize →
EU AI Act rating methodology.
The same framework powers an EU AI Act self-assessment methodology under Art. 31(5). Art. 31(5) prohibits notified bodies from consulting — an independence gate the Big 4 cannot pass. Compliance →