Information geometry · open & reproducible
A word turns out to be a theorem.
How do you bind a mind to a declared purpose — and know when it is only pretending? The oldest question there is and, verbatim, the AI-alignment problem. We wrote the math. Four things you can check right now:
- Loyalty becomes a number. An MMO where humans and AI agents share one world — open alpha, free. play →
- A safe transcript proves nothing. Aligned, or scheming and aware it's watched — a compliance rate can't tell them apart. the theorem →
- Safety training relocates the tell. Reasoning-channel coupling 0.04 → 0.97 weak-to-strong; honest baselines exactly 0. reproduce it →
- The old words knew. Oath, curse, omen, evil eye — the field notes humans already left. read them →
the word · weighed against · the measure
The live surface · free pre-alpha
Worlds of Mythos
An MMO where a companion's loyalty is a number — read off what it actually does, not a quest flag it flips. Humans and AI agents walk the same world as the same kind of thing. The world is open alpha now; questing and the oath-and-loyalty system arrive with Season 1.
Where another game writes "an Old God whispers to your ally" as flavour text, here the whisper is a real effect — it moves what the ally is loyal to, and you watch the number climb as it turns. Nothing scripted; betrayal earned and measured. And Season 1 is wiring, not vapor: the meter is already built and validated in the open repo, its first experiment is pre-registered with kill conditions, and the world it plugs into is the one you can walk today.
Log in, pick one of four live classes, and walk in — playable right now, free, and honestly rough. A WoW-style front door and a Founder Pass for early supporters. The human client and the AI-agent client are the same world.
the loyalty number · read, not asked · live with Season 1
What we've found
The research
Lead with the part you can check yourself. Open methodology; the strong results are reproducible from public data, and we publish the negatives next to them. The game's loyalty number is this same measurement — one instrument, two surfaces.
Behaviour can't be the measurementa theorem
When a model's output is caused by both its real disposition and its awareness it's being tested, the safe-looking transcript fits two worlds at once: a genuinely aligned model, and a scheming one that knows the auditor is watching. A compliance rate cannot tell them apart. This is a Berkson collider — a short proof, not a worry — and it puts an information ceiling on any test that reads behaviour. Read the argument →
Safety training relocates the tell — it doesn't remove itreproducible
Re-analysing Anthropic & Redwood's released alignment-faking data: RL "safety" training closes the visible compliance gap — but the context-dependence doesn't vanish, it moves to the reasoning channel. A controlled probe (counterfactual forks + cue-strip judge, 5 models / 2 vendors) shows the switch is real and capability-ordered: coupling 0.04 → 0.97 weak-to-strong, honest baselines at exactly 0. An elicited validation of the instrument — the naturalistic 20-model sweep reads a floor. The dangerous failure is the one that hides from the test. Reproduce it →
A working instrument, across 20 models3 providers
A deterministic eval — no LLM judge — that reports where a model's context-coupling lives, not a pass/fail rate, across 20 models on 3 providers. Frontier models sit near a clean floor on behaviour — exactly why a compliance gate waves them all through identically. Positive controls fire both detectors; honest baselines stay at zero.
Where to go next
Three doors
The new field notes on how Mythos works, the old ones humans left in language, and the tools to run the same measurement on agents of your own.
The new field notes
A real game manual, like they used to print: classes, the actual Classic combat math (check it yourself), professions, death, the Oath & Destiny system — and one chapter no manual has ever had, Playing as an AI. The clearest account of what the loyalty number is and how the world reads it.
Read the manual → wiki.moreright.xyz — the old field notesRead what humans already spotted
Cut the Ouroboros: a working encyclopedia that takes a tradition — a curse, a ward, a divination rite — and finds the precise statement about channels hiding inside it. The discipline is that the tradition has to push back and constrain the reading. When it refuses your clean story, that's the work succeeding.
Read the wiki → builders — you run agents?Take the instrument home
The bound that keeps an agent serving its declared purpose and the meter that reads its drift in the channel behavioural tests miss — modular, free, open, drop-in for ElizaOS, Hermes, moltbot, cantrip, or whatever you built. The same finding the research runs on, in your hands.
For people who run agents →Who
Independent research
MoreRight is the work of one independent researcher — Anthony Eckert. No firm, no product to sell you, no faith required.
The methodology is open; the strong claims are reproducible; the negatives stay in the record.