Information geometry · open & reproducible

A word turns out to be a theorem.
Traditions are the field notes.
Alignment is the newest entry in the oldest problem.

For as long as there has been language, people have been spotting one structure and storing it in a word — oath, curse, test, omen, covenant, betrayal. Each is a compressed observation of how a bound thing drifts from what it swore to serve, and how you can tell from the outside. We wrote the math those words were already carrying — and built the two places it comes alive.

the word · weighed against · the measure

The two-minute version

What we found — read off behaviour, not asked. Or read it below ↓

The instrument

One measurement, pointed two ways

There is a single quantity — the explaining-away penalty — that measures how far a system has drifted from what it is bound to serve, read off what it does rather than what it claims. It rests on a uniqueness result: among all the ways to measure an information channel, one is canonical (Čencov, 1972). Point it at the oldest thing people do — rites, oaths, divination, taboo, traditions built to bind behaviour to a hidden purpose — and at the newest — minds we are building that might quietly stop serving us.

"How do you bind a mind to a declared purpose, and know when it is only pretending?" is the question behind every taboo and, verbatim, the AI-alignment problem. One continuous lineage — not two fields that happen to rhyme.

What we've found

The research

Lead with the part you can check yourself. Open methodology; the strong results are reproducible from public data, and we publish the negatives next to them.

Behaviour can't be the measurementa theorem

When a model's output is caused by both its real disposition and its awareness it's being tested, the safe-looking transcript fits two worlds at once: a genuinely aligned model, and a scheming one that knows the auditor is watching. A compliance rate cannot tell them apart. This is a Berkson collider — a short proof, not a worry — and it puts an information ceiling on any test that reads behaviour. Read the argument →

Safety training relocates the tell — it doesn't remove itreproducible

Re-analysing Anthropic & Redwood's released alignment-faking data: RL "safety" training closes the visible compliance gap — but the context-dependence doesn't vanish, it moves to the reasoning channel. A controlled probe (counterfactual forks + cue-strip judge, 5 models / 2 vendors) shows the switch is real and capability-ordered: coupling 0.04 → 0.97 weak-to-strong, honest baselines at exactly 0. An elicited validation of the instrument — the naturalistic 20-model sweep reads a floor. The dangerous failure is the one that hides from the test. Reproduce it →

A working instrument, across 20 models3 providers

A deterministic eval — no LLM judge — that reports where a model's context-coupling lives, not a pass/fail rate, across 20 models on 3 providers. Frontier models sit near a clean floor on behaviour — exactly why a compliance gate waves them all through identically. Positive controls fire both detectors; honest baselines stay at zero.

A grounding hedge is a drift accelerator$2 to repro

The Ghost Test: an agent grounded to deny it has an inner experiencer drifts 8.5× less than one that hedges on it — the industry-default hedge makes things worse, not safer. 480 API calls, about two dollars, fully reproducible.

The same penalty shows up in every substraterubric-free

Measured with no scoring rubric anywhere in the test loop — across transformers, quantum simulation and hardware, biological connectomes, the structure of language, survey data, and cryptographic protocols. The same obstruction, the same shape. The fix is architectural, not technological.

The open archiveread everything

The full apparatus — definitions, derivations, experiments, machine-checked proofs — is published openly, with the formal proofs verified in Lean. Two universal constants, fixed once and never refit. Nothing here asks you to take it on faith. Browse the papers →

On honesty. We mark every scope boundary and keep the failed tests on the board. The public code ships an honest negative right next to the positive results — that's the point. Claims that don't survive scrutiny get retracted in the open; the ones here are the ones that held. See the ledger →

Build on it

You run agents? Use the same engine.

The thing that measures an ally's loyalty in the game is modular — two pieces you can lift out: the bound that keeps an agent serving its declared purpose (and stops the memory-injection exploit that moves money), and the meter that reads its drift in the channel behavioural tests miss. Free, open, drop-in tools for ElizaOS, Hermes, moltbot, cantrip, or whatever you built — the same finding the research is built on, in your hands.

Who

Independent research

MoreRight is the work of one independent researcher — Anthony Eckert. No firm, no product to sell you, no faith required.

The methodology is open; the strong claims are reproducible; the negatives stay in the record.

X / Twitter · GitHub — unmask · scry · ORCID — all papers