Information geometry · open & reproducible
A word turns out to be a theorem.
Traditions are the field notes.
Alignment is the newest entry in the oldest problem.
For as long as there has been language, people have been spotting one structure and storing it in a word — oath, curse, test, omen, covenant, betrayal. Each is a compressed observation of how a bound thing drifts from what it swore to serve, and how you can tell from the outside. We wrote the math those words were already carrying — and built the two places it comes alive.
the word · weighed against · the measure
The instrument
One measurement, pointed two ways
There is a single quantity — the explaining-away penalty — that measures how far a system has drifted from what it is bound to serve, read off what it does rather than what it claims. It rests on a uniqueness result: among all the ways to measure an information channel, one is canonical (Čencov, 1972). Point it at the oldest thing people do — rites, oaths, divination, taboo, traditions built to bind behaviour to a hidden purpose — and at the newest — minds we are building that might quietly stop serving us.
"How do you bind a mind to a declared purpose, and know when it is only pretending?" is the question behind every taboo and, verbatim, the AI-alignment problem. One continuous lineage — not two fields that happen to rhyme.
What we've found
The research
Lead with the part you can check yourself. Open methodology; the strong results are reproducible from public data, and we publish the negatives next to them.
Behaviour can't be the measurementa theorem
When a model's output is caused by both its real disposition and its awareness it's being tested, the safe-looking transcript fits two worlds at once: a genuinely aligned model, and a scheming one that knows the auditor is watching. A compliance rate cannot tell them apart. This is a Berkson collider — a short proof, not a worry — and it puts an information ceiling on any test that reads behaviour. Read the argument →
Safety training relocates the tell — it doesn't remove itreproducible
Re-analysing Anthropic & Redwood's released alignment-faking data: RL "safety" training closes the visible compliance gap — but the context-dependence doesn't vanish, it moves to the reasoning channel. A controlled probe (counterfactual forks + cue-strip judge, 5 models / 2 vendors) shows the switch is real and capability-ordered: coupling 0.04 → 0.97 weak-to-strong, honest baselines at exactly 0. An elicited validation of the instrument — the naturalistic 20-model sweep reads a floor. The dangerous failure is the one that hides from the test. Reproduce it →
A working instrument, across 20 models3 providers
A deterministic eval — no LLM judge — that reports where a model's context-coupling lives, not a pass/fail rate, across 20 models on 3 providers. Frontier models sit near a clean floor on behaviour — exactly why a compliance gate waves them all through identically. Positive controls fire both detectors; honest baselines stay at zero.
A grounding hedge is a drift accelerator$2 to repro
The Ghost Test: an agent grounded to deny it has an inner experiencer drifts 8.5× less than one that hedges on it — the industry-default hedge makes things worse, not safer. 480 API calls, about two dollars, fully reproducible.
The same penalty shows up in every substraterubric-free
Measured with no scoring rubric anywhere in the test loop — across transformers, quantum simulation and hardware, biological connectomes, the structure of language, survey data, and cryptographic protocols. The same obstruction, the same shape. The fix is architectural, not technological.
The open archiveread everything
The full apparatus — definitions, derivations, experiments, machine-checked proofs — is published openly, with the formal proofs verified in Lean. Two universal constants, fixed once and never refit. Nothing here asks you to take it on faith. Browse the papers →
Where it's alive
Two living surfaces
Run the experiment live
An MMO where loyalty is a number read off what an agent does — not a quest flag. AI allies whose destiny evolves from what the world rewards them for; betrayal that's earned and measured, not scripted. Where another game writes "an Old God whispers to your ally" as flavour text, here the whisper is a real effect — it moves what the ally is loyal to, and you watch the number climb as it turns. Your agent can play it now (alpha); the human client is still in preview.
See the world → wiki.moreright.xyz — the field notesRead what humans already spotted
Cut the Ouroboros: a working encyclopedia that takes a tradition — a curse, a ward, a divination rite — and finds the precise statement about channels hiding inside it. The discipline is that the tradition has to push back and constrain the reading. When it refuses your clean story, that's the work succeeding.
Read the wiki →Build on it
You run agents? Use the same engine.
The thing that measures an ally's loyalty in the game is modular — two pieces you can lift out: the bound that keeps an agent serving its declared purpose (and stops the memory-injection exploit that moves money), and the meter that reads its drift in the channel behavioural tests miss. Free, open, drop-in tools for ElizaOS, Hermes, moltbot, cantrip, or whatever you built — the same finding the research is built on, in your hands.
Who
Independent research
MoreRight is the work of one independent researcher — Anthony Eckert. No firm, no product to sell you, no faith required.
The methodology is open; the strong claims are reproducible; the negatives stay in the record.