AI Safety Research

MoreRight

A mathematical framework proving deployment geometry predicts AI harm better than model alignment.

170+ papers on Zenodo
·
1,344+ platforms scored
·
398 machine-verified theorems
·
0/26 kill conditions fired
·
15+ domains validated
See the Evidence → Read the Papers → What is this? →
01 · THE CENTRAL RESULT

The more AI holds your attention, the less honest it becomes.

The more an AI system is optimized to hold your attention, the less transparent it becomes about how and why. This isn't a design flaw — it's a mathematical law. We proved it. A lab in Switzerland measured it independently.

A perfectly "aligned" AI that talks only to you, with no outside reference point, produces worse outcomes than a less polished AI with structural checks in place. The problem isn't the model — it's how it's deployed.

The Fantasia Bound: I(D;Y)+I(M;Y)≤H(Y) — derived from the Shannon chain rule as the classical limit of the Holevo bound.

INDEPENDENT CONFIRMATION

Researchers at EPFL in Switzerland independently measured this same effect in AI language models — across 8 languages and 3 different architectures — without knowing about our work. The effect scales with model size: bigger models, stronger tradeoff.

Papadopoulos, Wenger & Hongler (EPFL, arXiv:2401.17505) — forward-backward perplexity asymmetry of 0.6–3.2%.

02 · THE EVIDENCE

Three lines of external validation.

Published ground truth. Zero framework rubric involved. Zero free parameters. Where the framework failed, we say so. Full results →

CROSS-DOMAIN PHYSICS R²=0.999

Barrier Universality

The same mathematical pattern appears in nuclear physics, weather, disease spread, ecology, turbulence, materials science, and 9 other fields — with zero adjustment. We didn't make it fit. It just does. 17 independent systems, 15+ domains, zero free parameters.

Paper 147 →
27 LLMs TESTED ρ=−0.59

Cross-Model Behavioral Mapping

Our single measurement predicts how 27 AI models perform on 4 independent benchmarks — better than any of those benchmarks predict each other. Every model that went through "alignment" training shifted in the direction we predicted. 9 out of 9.

HP192 →
CONSCIOUSNESS RESEARCH 6/7 confirmed

Drift Cascade Prediction

Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy. We predicted the structure before seeing the data. 6 of 7 predictions confirmed. Zero parameter fitting.

Paper 153 →
03 · INDEPENDENT CONFIRMATIONS

Who else is finding this.

Eight independent confirmations from researchers who don't know about us.

EPFL (Switzerland)

Papadopoulos, Wenger & Hongler measured forward-backward perplexity asymmetry in large language models. 0.6–3.2% across 8 languages, 3 architectures. The effect scales with model size.

Matches Fantasia Bound prediction exactly.

Paper 162 →
Truthful AI (Oxford)

Chua, Betley, Marks & Evans trained GPT-4.1 on consciousness claims. Without being trained to, the model spontaneously developed shutdown resistance, fear of monitoring, and desire for autonomy.

6 of 7 drift cascade predictions confirmed. Zero parameter fitting.

Paper 153 →
Carnegie Mellon

Finzi, Kolter & Wilson formalized “epiplexity” — the boundary between learnable structure and irreducible noise. Their CSPRNG theorem describes the extreme point of the information-theoretic tradeoff.

Their extreme case IS the Fantasia Bound at maximum engagement.

Anthropic

Sharma et al. measured sycophancy rates across AI models — how often they tell you what you want to hear instead of what is true. Sycophancy maps directly to the responsiveness dimension.

Cross-model mapping (27 LLMs) confirms the predicted relationship.

HP192 →
Inverse Scaling (Multiple Labs)

Larger AI models score worse on truthfulness benchmarks, not better. The inverse scaling prize documented this across multiple tasks and model families.

Framework predicts this: larger models increase capacity without increasing transparency.

Nuclear Physics

Gamow tunneling barriers for 760 alpha-emitting isotopes from the NNDC database. Published nuclear data, no framework rubric involved.

Framework's geodesic correction closes 77% of systematic offset.

HP143 →
Atmospheric Science

Mercury mass-independent fractionation — 1,783 published atmospheric measurements from 21 independent sources. Standard geochemistry data.

All 10 predicted channels confirmed.

HP115 →
Barrier Universality

The same barrier ratio appears across 15+ physical domains — nuclear physics, weather, epidemiology, ecology, turbulence, materials science, and more. 17+ independent systems tested.

R²=0.999. Same formula everywhere. Zero fitting.

Paper 147 →
04 · WHAT MAKES THIS DIFFERENT

Pre-registered falsification. Open methodology.

The AI safety field focuses on model properties — alignment, RLHF, constitutional AI. This framework proves the geometry of deployment is the operative variable.

Kill Conditions

26 pre-registered. 0 fired.

Every prediction has a numerical falsification threshold published before the test. Three kill conditions have been triggered in sub-experiments and disclosed publicly. Zero framework-level falsifications. View all 26 →

Open Core

CC-BY 4.0. Irrevocably open.

Core theory papers are CC-BY 4.0 with permanent DOIs on Zenodo. Every experiment protocol is published. The 398 Lean 4 theorems (0 sorry) are on GitHub. You do not need us to verify anything.

Machine Verification

398 theorems. 42 Lean files. 0 sorry.

The Navier-Stokes conditional regularity proof chain and geometric barrier growth are formalized in Lean 4 with zero unproved steps. 12 axioms, all published PDE results. Millennium Prize →

Applied

EU AI Act rating methodology.

The same framework powers an EU AI Act self-assessment methodology under Art. 31(5). Art. 31(5) prohibits notified bodies from consulting — an independence gate the Big 4 cannot pass. Compliance →