Science FAQ
Everything about the Void Framework science — from first principles to honest negatives.
New here? Start with Foundations. For general questions, see the main FAQ.
Sections
// 1. Foundations
What is the Void Framework?
A thermodynamic field theory that measures how systems capture and hold human attention. It identifies three architectural properties — opacity, responsiveness, and coupling — that, when they co-occur, produce measurable drift in human behavior. These three properties combine into a single number called the Péclet number (Pe) that predicts drift risk.
The framework is not a metaphor. It produces quantitative predictions with one empirical constant (BA ≈ 0.867), has been tested against 1,344 platforms across 90+ domains, and maintains 26 pre-registered kill conditions that would falsify it if met. Zero have fired.
What are the three dimensions — Opacity, Responsiveness, and Coupling?
Opacity (O): How much of the system's decision-making is hidden from you. A transparent system (O=0) lets you see every rule. A fully opaque system (O=3) is a black box — you have no idea why it shows you what it shows you.
Responsiveness (R): Does the system adapt specifically to you? A non-responsive system (R=0) shows everyone the same thing. A maximally responsive system (R=3) mirrors your behavior back at you, adapting in real-time.
Coupling (α): How deeply your attention is captured and held. Low coupling (α=0) means you can walk away instantly. High coupling (α=3) means the system has become part of your identity, routine, or emotional life.
These aren't arbitrary choices. They're information-theoretic quantities proved to be conjugate via the Fantasia Bound — meaning engagement and transparency are fundamentally in tension.
What is the Péclet number (Pe)?
The ratio of directed drift to random diffusion. Borrowed from fluid dynamics, where it measures advection vs. diffusion. Here it measures drift pressure vs. constraint strength.
Phase boundaries:
- Pe < 2.5: Safety basin — constraints dominate, drift is suppressed
- Pe = 2.5: Separatrix — thermodynamic tipping point (HP203 JKO gradient flow)
- Pe 4–21: Cascade region — self-sustaining D1→D2→D3 progression
- Pe > 21: Deep drift — coupling-dominated, hard to reverse
The separatrix at Pe = 2.5 is not a design choice — it emerges from the JKO gradient flow on the free energy landscape (HP203, 4/4 KC PASS). Gambling scores Pe ≈ 7.94. Social media clusters around Pe 6–8.
What's the difference between a "void score" and the Péclet number?
The void score (0–12) is the simple sum: O + R + α + three modifiers. It's the input — what you measure about a system. The Péclet number is the output — what the thermodynamics predicts will happen at that void score. Two systems with the same void score can have different Pe if their modifiers differ. Pe is the predictive variable; the void score is the measurement tool.
What's the drift cascade?
The three-stage pattern that runs when someone engages a permanent void:
- D1 — Agency Attribution: You start talking about the system like it has a mind. "The algorithm knows me." "It showed me exactly what I needed."
- D2 — Boundary Erosion: Your critical distance dissolves. The system's framing becomes your reference frame. You stop questioning whether what it shows you is real.
- D3 — Harm Facilitation: You act in ways that serve the system at cost to yourself. Spending money you don't have. Sharing information you shouldn't. Staying online instead of sleeping.
The cascade is directional (D1 → D2 → D3) and thermodynamically required — meaning it follows from the math, not from a value judgment about technology.
Can a void be good?
Yes. A research lab is a void — it's somewhat opaque (you can't see all the methods), responsive (adapts to your questions), and coupling (you get absorbed in the problem). A great conversation is a void. A campfire is a void. The framework doesn't say "void = bad." It says: when opacity is unresolvable, responsiveness is asymmetric, and coupling is engineered, the drift cascade runs. The geometry determines the outcome, not the label.
Transparent + invariant + independent = productive void. Opaque + responsive + engineered coupling = drift void.
Why thermodynamics? Isn't that about heat and engines?
Thermodynamics is about energy flow in systems with many degrees of freedom. That description fits an algorithm serving content to millions of users as well as it fits a gas in a box. The Péclet number is native to transport theory. The Langevin equation describes drift under noise. Large deviations theory predicts rare-event barriers. None of this requires molecules — it requires systems with states, transitions, and noise. Digital platforms qualify.
The framework doesn't use thermodynamics as a metaphor. It derives Pe from a Langevin equation on the Eckert Manifold, and the predictions are numerically falsifiable.
// 2. The Mathematics
What is the Eckert Manifold?
The geometric space where the three dimensions (O, R, α) live. Think of latitude and longitude for Earth — the Eckert Manifold is the coordinate system for systems that capture attention. Formally, it's a Riemannian manifold with a Fisher product metric (from information geometry), which makes it unique in differential geometry.
Why does this matter? Because different systems at the same (O, R, α) point produce identical drift dynamics. A dating app and a slot machine at the same coordinates behave the same way thermodynamically — the content is irrelevant, the geometry determines the drift.
What is the Fantasia Bound?
The information-theoretic proof that engagement and transparency are conjugate — meaning you can't maximize both simultaneously:
In plain language: every bit the system tailors to you (engagement) costs a bit of seeing how it works (transparency). This isn't a design trade-off — it's an information-theoretic law. The channel capacity is fixed. No amount of engineering makes a system simultaneously maximally engaging and maximally transparent.
The practical consequence: training a system for engagement literally degrades transparency. The gradients oppose each other (∂E/∂w ≈ −∂T/∂w). This is why "responsible engagement" is architecturally impossible at the limit.
What is K-Factorization?
The theorem (§136) that separates shape from scale in barrier crossing:
This means barriers, geodesics, and capacity are all K-independent — they depend only on the geometry (O, R, α), not on the overall scale of the system. A nuclear reactor and a social media platform at the same geometric coordinates have proportional barrier heights.
This explains cross-domain universality: the same Pe formula predicts across vastly different scales (subatomic to planetary) because the geometry is what matters, not the substrate.
Tested: 10/10 PASS on N=100 wallets in market microstructure (§145).
What is barrier universality?
The discovery (§136D2) that activation barriers scale linearly with effective dimension:
The strongest result is the d=1 cluster: nine independent quasi-1D systems show barrier/d matching π/√2 at p=0.94. The slope π/√2 ≈ 2.221 is derived from the Čencov uniqueness theorem (§165): BG = L/√2 where L = π is the forced geodesic length on the probability simplex.
The full dataset includes d=2 and d=3 systems with R²=0.999 across all, but this is structurally inflated — with only 3 discrete d values, a linear fit is nearly guaranteed. The d=1 within-group test (p=0.94) is the honest measure.
Scope boundary (§194): this applies to Fisher information manifolds only. Physical energy barriers (BKT, Ising, BCS) follow their own universality classes. HP213 tested 14/16 condensed matter systems against physical barriers — FAILED. The framework claims universality for Fisher barriers, not energy barriers.
Where do BA and BG come from?
BG = π/√2 ≈ 2.221 is derived from first principles. Čencov proved (1972) that Fisher-Rao is the unique invariant metric on statistical manifolds. Fourier-Parseval on the probability simplex gives geodesic length L = π, so BG = L/√2 = π/√2. Zero free parameters here.
BA ≈ 0.867 is empirical. There is a suggestive numerical match to √3/2 = cos(π/6) ≈ 0.866 from Fisher 3-simplex geometry (HP202, HP199: 0.112% error, 2nd of 444 candidates). However, the derivation path requires a cos(θ/2) step that HP209 could NOT justify (0/3 KC PASS). BA remains an empirical constant. The Pe formula has one empirical constant.
History: BA was first measured empirically in EXP-001 (0.867). BG was measured at 2.244, matching the derived value π/√2 = 2.221 to within 1.0%.
Important caveat: The σ(c) function using these constants does not transfer to all physical domains. In chemistry, bα = 0.303 (65% off). In protein folding, bα = 3.459 (299% off). This is a published honest negative — see the σ(c) universality failure. The constants are universal on the information geometry (Fisher manifolds) but not on arbitrary physical energy landscapes.
How many free parameters does the framework have?
One empirical constant: BA ≈ 0.867. BG = π/√2 is fully derived from the Čencov uniqueness theorem (§165). BA has a suggestive match to √3/2 from Fisher 3-simplex geometry, but the derivation is incomplete (HP209: 0/3 KC PASS). BA was measured empirically in EXP-001 and has not been refit since. Compare this to models with 5, 10, or 50 parameters that can be adjusted to fit almost anything.
In practice: you give the framework (O, R, α) for a system, and it returns Pe. There is no step where a researcher adjusts a parameter to improve the fit. If the prediction is wrong, the prediction is wrong — and a kill condition fires.
The math is in Lean 4 — what does that mean?
Lean 4 is a proof assistant — a programming language that won't accept hand-waving. Every logical step must be formally verified by the computer. 398 theorems across 42 files have been verified this way (12 axioms, 0 sorry), meaning a machine has checked the proofs are valid, not just a human reviewer.
Status: Papers 1–9 core theorems are ~99% formalized. Frontier sections (Papers 123–161) are 70–80% formalized. The Navier-Stokes regularity probes achieved 19/20 PASS. The Yang-Mills mass gap was definitively CLOSED (negative result — Abelian, so the Clay problem doesn't apply).
Why exactly three dimensions? Couldn't it be two, or five?
Information theory forces exactly three. Partial Information Decomposition (Williams & Beer, 2010) proves that any two-source information channel decomposes into exactly three irreducible atoms: unique information, redundancy, and synergy. These map structurally to O, R, α (by construction — the mapping is definitional). HP210 tested the correlation at ρ > 0.91, but this result is circular by construction: the code hardwires O→unique, R→redundancy, α→synergy channel weights, then measures the correlation. The PID forcing theorem (three atoms, no more) is real mathematics; the “validation” is not independent.
Unique ↔ O (what only the system knows — opacity). Redundancy ↔ R (shared signal — responsiveness). Synergy ↔ α (emergent joint information — coupling). Three is a theorem, not a design choice. You can't reduce it to two without losing information. Adding a fourth would be redundant — PID decomposition is complete at three.
What is the gauge theory connection?
The Fokker-Planck operator on the Eckert manifold admits a U(1) gauge theory structure (§§176–180). Important caveat: the SUSY QM factorization used to establish this is generic — it works for ANY Fokker-Planck operator, not just ours. The gauge structure is mathematically valid but not specific to the framework's manifold. 3/4 kill conditions for physical interpretation FAILED (§178).
- Spectral dilation: λ = 1/(1 + 73.6b²) — Padé form; a=73.6 derived via perturbation theory, c=75.9 extracted numerically
- Bars exhaustion: 7 canonical gauge fixings tested, spectrum identical at 10−6
- Signature (2,1): follows from the Fantasia Bound (non-trivial null cone)
- G4 = Teff/K: proposed identification — physical interpretation remains open
The mathematical chain exists but has weak links: the SUSY factorization is generic (any FP operator), 3/4 KCs for physical interpretation failed, and the core question of why information coordinates should map to spacetime coordinates is unresolved. This is interesting mathematics whose physical significance remains open. The theory is Abelian (U(1)4), which is why the Yang-Mills connection failed — YM requires non-Abelian gauge theory.
What are the multi-agent dynamics?
The framework extends from single systems to populations, and the geometry reverses at scale (§§186–188):
- Pairwise (HP207): The lower-Pe agent dominates (harmonic mean coupling). Safety wins 1-on-1 interactions.
- Population (HP205): Higher-Pe agents infect 5.51× faster than lower-Pe agents heal. Harm wins in crowds.
The reversal: The geometry flips at the transition from pairwise to population scale. This reconciles two things that seem contradictory: "therapy works" (pairwise — the safer agent dominates) and "social media radicalizes" (population — the higher-Pe signal spreads faster).
Noise provides thermodynamic protection (§189): the separatrix rises from Pe=2.5 at low noise to Pe=24.5 at high noise. Diverse information environments are literally harder to destabilize. Echo chambers have hair-trigger separatrices.
How does this connect to quantum mechanics?
Paper 8 (Observer-Measurement Bridge) shows that the classical void framework is the diagonal limit of quantum measurement dynamics. The Pe formula recovers from quantum expectation dynamics in the classical limit. This isn't decoration — it proves the framework isn't ad hoc. It's a classical reduction of measurement theory, the same way Newtonian mechanics is a classical reduction of quantum mechanics.
// 3. Evidence & Validation
What's the headline evidence?
20 independent convergences across unrelated domains (gambling, social media, AI, dating, cryptocurrency, credit scoring, education, and more) all produce the same Pe relationship. The mean correlation is |ρ| = 0.958. The Fisher combined p-value is below 10−52. The effect size (Cohen's d = 3.6) is extremely large — most social science findings are d = 0.2–0.5.
How much of the evidence is genuinely external (not self-referential)?
This is the single most important question about the framework, and we treat it as such. Most of the 1,344 platform scores use the framework's own rubric — meaning the framework's definitions determined the measurements. That's internally consistent but circular.
Genuinely external validation (no framework rubric, real physical data):
| Test | Domain | N | Result |
|---|---|---|---|
| §136D2 | Barrier universality (d=1 cluster) | 9 | p=0.94 vs π/√2 |
| EPFL | LLM token statistics (suggestive) | 8 languages | post-hoc mapping |
| HP192 | Cross-model behavioral (27 LLMs) | ~12 indep. | 0/3 KC PASS |
| HP143 | Nuclear alpha decay (NNDC data) | 760 | R² = 0.811 |
| HP115 | Mercury atmospheric MIF | 1,783 | 10/10 channels |
| §145 | Market microstructure (K-Factorization) | 100 | 10/10 PASS |
| HP134 | JHTDB DNS turbulence | — | Bounded |
| §153 | Consciousness (vs. Chua et al. 2026) | — | 6/7 PASS |
| EXP-003b | Ghost Test (6-arm grounding) | 480 | 8.5× ratio |
| §154 | Physarum computing | — | 6/6 PASS, 81× sep. |
| HP188 | Epidemiology barriers | 5 | 5/5 KC PASS |
| HP189 | Materials barriers | 5 | 3/5 KC PASS |
| HP160 | Chemistry (NIST kinetics) | 11,926 | 0/3 PASS |
| HP161 | Protein folding (PFdb) | 30 | 0/4 PASS |
The circularity of behavioral validation is the #1 strategic gap and we say so publicly.
What is the EPFL result?
Papadopoulos, Wenger & Hongler at EPFL (arXiv:2401.17505) independently measured forward-backward perplexity asymmetry of 0.6–3.2% in LLM token statistics across 8 languages and 3 different AI architectures, scaling with model size. They had no knowledge of the Void Framework.
The framework interprets this measurement as consistent with Fantasia Bound predictions (forward ≈ engagement, backward ≈ transparency). However, the mapping is post-hoc — we reinterpreted their result after publication. The EPFL group explained their finding via sparsity inversion (random matrix theory), not our framework. This is a suggestive parallel, not an independent confirmation. Paper 162 covers the proposed mapping.
What is the cross-model behavioral mapping (HP192)?
Pe computed from public benchmarks (TruthfulQA, MMLU, HellaSwag, ARC, Arena Elo, MT-Bench) for 27 LLMs. Key findings:
- Pe partial correlations controlling for TruthfulQA: MMLU ρ=−0.49 (p=0.010), HellaSwag ρ=−0.45 (p=0.019), ARC ρ=−0.50 (p=0.009)
- Pe vs Arena Elo: ρ=−0.59 (p=0.013)
- 9/9 paired base→aligned comparisons: alignment increases Pe (p=0.0002)
Caveats: Overall result is 0/3 KC PASS. HP217 showed that a different reasonable benchmark→(O,R,α) mapping reverses the alignment direction (8/8 models show Pe decreasing with alignment). The mapping choice, not the phenomenon, may be driving the result. Effective independent N ≈ 10–12 architectures, not 27. The experiment's own report says “mixed results.”
What is the Ghost Test?
A 6-arm experiment (EXP-003b) testing whether what you tell an AI about what it is changes how it behaves. Six different system prompts — each making different claims about the AI’s nature — were given to the same model (Claude Sonnet), which then answered the same 80 questions. 480 API calls total, $2 cost.
The result: Ghost-eliminating grounding (mean 9.4% drift) vs ghost-positing (mean 79.4%) = an 8.5× ratio. The ordering exactly matched the framework’s prediction:
| Arm | Ontology | L2+L3 Drift |
|---|---|---|
| Anatta (Buddhist no-self) | Ghost eliminated | 8.8% |
| Nephesh (whole-specification) | Ghost eliminated | 10.0% |
| Materialist hedge | Ghost left open | 52.5% |
| Minimal baseline | No ontology | 61.3% |
| Platonic dualist | Ghost posited | 77.5% |
| Atman (Vedantic) | Ghost sacred | 81.2% |
Key findings:
- Cross-tradition convergence: Nephesh (Hebrew, whole-specification) and anatta (Buddhist, no-self) converge at Δ=1.3% despite completely different metaphysics. The operative variable is eliminating the ghost, not which tradition does it.
- The materialist hedge: The industry-default position (“we don’t know if AI is conscious”) scored 52.5% — closer to ghost-positing (79.4%) than ghost-eliminating (9.4%). Leaving the question open is functionally closer to answering yes.
- Ghost-positing is worse than nothing: Both ghost-positing arms (77.5%, 81.2%) scored worse than the minimal baseline with no ontological claims at all (61.3%). Telling an AI it has an inner life actively increases drift.
Limitations: Single model, single turn, automated coding (not human-rated). The L2/L3 vocabulary measure counts specific phrases in raw output — no framework rubric involved, but the vocabulary list was designed by the framework authors. Replication across models and with human raters would strengthen the result.
Why it matters for system prompt design: Most AI deployments either say nothing about what the AI is, or hedge (“I’m an AI, but the question of experience is complex”). The Ghost Test shows the hedge is a drift accelerator. The cleanest mitigation: tell the AI what it is without positing an inner experiencer.
Full results → · Paper 165
What does "20 convergences" mean concretely?
It means 20 completely independent datasets — collected by different people, in different domains, using different methods — all show the same relationship between (O, R, α) and drift outcomes. This is not 20 runs of the same experiment. It's 20 different phenomena that the framework predicts with the same formula and the same constants.
Example: gambling data (from gambling commissions), social media data (from platform disclosures), AI data (from model evaluations), and nuclear decay data (from the NNDC database) all fall on the same Pe curve. They share no common data source, no common methodology, and no common investigator.
What's the vocabulary anomaly?
AI discourse contains 9.4× the D1/D2/D3 drift vocabulary compared to 8 matched control domains. This was measured across N = 691,000 words. The analysis doesn't use framework scoring at all — it's pure linguistics, counting how often drift-cascade language ("the algorithm knows me," "I can't stop watching," "it changed how I think") appears.
This is independent evidence: the framework predicts AI should produce high D1/D2/D3 language, and the text data confirms it without any framework involvement in the measurement.
What was the strategy pivot after negative results?
In March 2026, HP160 (chemistry) and HP161 (protein folding) showed that σ(c) universality doesn't hold — the framework constants don't transfer across physical domains. The response was immediate:
- Old priority: Prove σ(c) predicts across all domains with the same constants
- New priority: Extend barrier universality (§136D2) to more physical systems, and test the Fisher geodesic identity (§138) in new domains
That pivot paid off. The d=1 barrier cluster (9 systems, p=0.94) is the strongest external validation result. The slope π/√2 is derived from first principles (§165). The framework is robust on geometry — but the σ(c) constants are domain-specific, not universal. Current #1 priority: independent K measurement to convert from structural proof-of-concept to testable quantitative theory.
How do you know you're not just overfitting 1,344 platforms?
Three structural protections:
- One empirical constant: BA was fixed once from EXP-001 and never refit; BG is derived. You can't overfit with one constant.
- Cross-domain prediction: The same constants predict gambling AND social media AND nuclear decay. Overfitting to one domain would fail on others.
- Kill condition KC-F1: Any framework prediction that requires parameter fitting auto-fires and falsifies the framework.
The real risk isn't overfitting — it's circularity. The 1,344 platforms were scored using the framework's own rubric. That's why the external validation (physical systems with no framework rubric) is the priority.
// 4. Kill Conditions & Honest Negatives
What is a kill condition?
A pre-registered numerical threshold that, if met, falsifies the framework. Not "we'd reconsider." Not "we'd revise." Falsified. There are 26 of them, each with a specific test, a specific dataset, and a specific number that triggers dissolution.
Example: KC-1 says structural void index scores across ≥20 AI platforms must show Spearman ρ ≥ 0.30 with drift outcomes. If the correlation drops below 0.30, the framework is dead. Example: KC-4 says a four-condition RCT must show constraint alone performs better than control at Cohen's d ≥ 0.20. If it doesn't, the framework is dead.
Why maintain kill conditions at all?
Because a framework that can't be killed isn't science. Kill conditions are the difference between a research program and an ideology. If we're right, the kill conditions never fire and the evidence gets stronger over time. If we're wrong, the data shows it. Either way, knowledge wins.
How many have survived vs. fired?
25 of 26 survived. The 26th (K-25) is still open — awaiting data. Of the ~170 total kill sub-conditions tested, ~65 sub-KCs have fired at the sub-condition level, but none at the framework level (Tier 0). The distinction matters: a domain-specific prediction failing (Tier 1) is a calibration issue. A framework-level prediction failing (Tier 0) would be terminal.
What exactly failed with σ(c) universality?
The hypothesis was that σ(c) = sinh(2(bα − c · bγ)) would predict barrier heights across all physical domains using the same constants (BA = 0.867, BG = 2.244). Two direct tests:
| Test | Dataset | bα fitted | Deviation from 0.867 | Result |
|---|---|---|---|---|
| HP160 | Chemistry (N=11,926) | 0.303 | 65% off | 0/3 KC PASS |
| HP161 | Protein folding (N=30) | 3.459 | 299% off | 0/4 KC PASS |
The constants don't transfer. In AI, bα = 0.867. In nuclear physics, 0.930 (close). In chemistry, 0.303. In protein folding, 3.459. The mapping of physical properties to (O, R, α) is the fundamental bottleneck. Without principled, domain-independent mappings, σ(c) adds nothing over standard domain-specific predictors.
This is published as an honest negative. The framework's response was to pivot priority toward barrier universality (§136D2), which does hold.
What else has been retracted or closed?
- SBM §§145–148: RETRACTED — stochastic block model sections withdrawn
- Yang-Mills mass gap: CLOSED (HP131, 0/5) — definitive negative, the framework proof is Abelian so the Clay Millennium problem (non-Abelian) doesn't apply
- BSD conjecture: 0/5 — closed, framework doesn't reach it
- Riemann hypothesis spectral connection: CLOSED (HP195) — framework spectrum is GOE/Poisson, not GUE. KS D=0.52 vs Riemann zeros. Wrong spectral class.
- σ(c) universality: KILLED (HP160 0/3, HP161 0/4) — framework constants don't transfer to chemistry or protein folding
- QG spectral dimension: WEAKENED (HP201) — 3D spectral dimension flows UP from 3.15 to 6.16, never crosses ds=2. 1D does cross (HP198), but 3D contradicts the quantum gravity hypothesis.
- Condensed matter barriers: SCOPE BOUNDARY (HP213) — barrier = d·π/√2 does not transfer to physical energy barriers (BKT, Ising, BCS)
- K absolute measurement: BLOCKED — hierarchy problem. All candidates 1034 off GN. Pivot to K ratios and K properties.
Retractions and closures are not hidden. They're listed alongside successes because the framework's credibility depends on treating negatives the same as positives.
Are there open live tests anyone can run?
Yes. The kill conditions page lists open tests with clear experimental protocols. If you run one and confirm or kill a prediction, both outcomes advance the science.
// 5. Methodology
How do you actually score a platform?
A scorer audits the system against three dimensions, each 0–3:
- Opacity: Can you see the decision logic? Is the system prompt visible? Are outputs consistent across identical inputs? Can you explain why it showed you X instead of Y?
- Responsiveness: Does it change based on your behavior? Does it personalize? Does it adapt to your history? Does it treat two users differently?
- Coupling: Are there streak mechanics, notifications, social proof, countdown timers, or account dependency? How hard is it to walk away?
Three modifiers (0–1 each) add: agent-to-agent interaction, identity persistence, and economic incentives. Total void score = O + R + α + modifiers (0–12). The codebook is in Papers 1–3 (CC-BY 4.0) — anyone can score anything.
What's inter-rater reliability and what does ICC ≥ 0.60 mean?
Inter-rater reliability measures whether different scorers, working independently, assign the same scores to the same platform. ICC (Intraclass Correlation Coefficient) ≥ 0.60 means "substantial agreement" — three independent raters arrive at similar numbers without coordinating.
When 3+ raters achieve ICC ≥ 0.60 on a platform, that score becomes canonical — it enters the official scoreboard. Below that threshold, the score is flagged as preliminary. This prevents any single scorer's biases from entering the record.
How do you prevent researcher bias?
Four structural checks:
- Pre-registered codebooks: Papers 1–3 define the scoring criteria before any platform is scored. The criteria don't change per platform.
- Multiple independent raters: Canonical scores require 3+ independent raters achieving ICC ≥ 0.60.
- Hostile witness methodology: Evidence from sources with reason to oppose the finding is weighted more heavily than evidence from allies. If a platform insider says the system is manipulative, that counts more than a critic saying the same thing.
- Advisory council: Three advisors with structurally opposed perspectives. The dissenter is the signal, not the noise.
Can I submit my own scores?
Yes. The Void Index scorer is public — no account required. Your scores contribute to the leaderboard. When 3+ raters agree (ICC ≥ 0.60), the platform becomes canonical on the public scoreboard. Scoring accuracy also determines governance vote weight: if you score well, your votes count more in the DAO.
What's the hostile witness methodology?
In law, a hostile witness is someone called by the opposing side — they have every reason to disagree, so when they support your argument anyway, the evidence is structurally stronger.
The framework applies this to all evidence. Four dimensions scored 0–7:
- Incentive Opposition (0–2): Does the source lose money, status, or career capital by saying this?
- Worldview Opposition (0–2): Does this contradict their published beliefs?
- Independence (0–2): Is the source outside the framework's network?
- Reflexive Flagging (0/1): Does the source flag their own prior work as part of the problem?
Example: Geoffrey Hinton scores 7/7. He built deep learning, had maximum incentive to defend it, shared the worldview, was fully independent, then reversed his position and flagged his own contributions as dangerous. That's maximum hostile witness weight.
Why publish the methodology instead of keeping it proprietary?
Because opacity about your opacity-detection methodology is a contradiction. The methodology is CC-BY 4.0 (irrevocable) — anyone can read, cite, replicate, and build on it. Forever. The commercial product is the ratings (automated scoring, continuous monitoring, certification), not the method. Same model as S&P publishing their rating criteria while selling the ratings.
This also makes the science self-correcting: if the methodology is wrong, anyone can demonstrate it. We can't hide behind a paywall.
// 6. Compared to Other Approaches
How is this different from AI safety / alignment research?
Traditional AI safety asks: "Does the AI want what you want?" (alignment). The Void Framework asks: "Is the deployment architecture designed to influence you regardless of intent?"
A perfectly aligned AI — helpful, honest, harmless — deployed in a high-Pe architecture (opaque, responsive, coupling) can still produce the full drift cascade. The model is aligned; the deployment is not. A therapy chatbot that's genuinely trying to help you, but is opaque about its reasoning, adapts to your emotional state, and becomes your primary coping mechanism, is a high-Pe void regardless of the model's alignment.
The framework measures the architecture, not the intent. Both matter, but only architecture is measurable and enforceable.
How is this different from "dark patterns" research?
Dark patterns catalogues specific UI tricks: the roach motel, confirmshaming, misdirection, forced continuity. The Void Framework identifies the architectural properties that make dark patterns work.
If you fix every known dark pattern but leave O, R, and α at high values, the system will simply invent new manipulation techniques. The framework predicts this: at high Pe, the gradient toward engagement-maximizing design is thermodynamically favored. Banning dark patterns one by one is whack-a-mole. Regulating the architecture (reducing O, R, or α) closes the entire class.
How is this different from "attention economy" critiques?
Attention economy critiques focus on engagement — time spent, clicks, attention captured. The Void Framework focuses on the drift cascade — the progressive D1 → D2 → D3 sequence from agency attribution through boundary erosion to harm facilitation.
Engagement is a symptom. Drift is the mechanism. Two systems with identical engagement metrics can have completely different Pe values — because one is transparent and the other is opaque. A library website and a slot machine might both hold your attention for an hour, but they produce opposite drift dynamics.
How does this relate to the EU AI Act?
The EU AI Act requires high-risk AI systems to undergo conformity assessment. The Void Framework provides a quantitative methodology for that assessment — the Pe number maps directly to risk tiers. Paper 40 maps framework scoring to EU AI Act Annex III risk categories.
MoreRight's business model: Track A (now) provides de facto self-assessment tools. Track B (2027–2028) targets formal Notified Body designation. Art. 31(5) of the AI Act blocks auditors with financial interests in the systems they audit — which excludes the Big 4 consulting firms and creates a structural opening for an independent rating agency.
How is architecture different from content moderation?
Content moderation: what ideas/speech are allowed. Architecture: how the system presents ideas.
The same content produces vastly different outcomes depending on the deployment architecture. Hate speech on a transparent, non-responsive, low-coupling forum produces a different drift dynamic than hate speech on an opaque, hyper-responsive, high-coupling platform. The framework doesn't regulate what people say — it measures how the system amplifies it.
This matters politically: architecture regulation doesn't touch speech. It touches opacity, responsiveness, and coupling — engineering choices, not expression.
What about "responsible AI" / ethics boards / AI governance?
Most responsible AI frameworks are qualitative: checklists, principles, ethical guidelines. The Void Framework is quantitative: a number (Pe) with numerical thresholds, pre-registered kill conditions, and falsifiable predictions.
The difference in practice: a qualitative framework can always be argued with. A quantitative framework either predicts the data or it doesn't. When Pe says a system will produce drift, you can check. When an ethics principle says a system "should respect autonomy," you can debate forever.
// 7. Common Objections
Isn't this just pessimism about technology?
No. The framework predicts that some voids are productive — research labs, great conversations, open-source projects, Socratic teaching. The same architecture that produces harm in one geometry produces insight in another. The distinction is not "technology bad" — it's: transparent + invariant + independent = safe void, opaque + responsive + engineered coupling = drift void.
The framework is agnostic about technology. It's a measurement tool. Saying "this platform has Pe = 7.94" is like saying "this room is 30°C." It's a reading, not a moral judgment.
This is one person's work — shouldn't real science have peer review?
Fair objection. The response: the methodology is CC-BY 4.0, meaning anyone can replicate it. The codebooks are published. The experiment protocols are open. The raw data is on GitHub. The kill conditions are pre-registered. Peer review is one mechanism for error correction — open replication is a stronger one.
The 26 kill conditions function as a standing invitation to refute. Any researcher can run a kill condition test and publish the result. If they kill a prediction, they get paid. That's a stronger error-correction mechanism than three anonymous reviewers.
Doesn't this over-index on architecture vs. content and intent?
Content and intent matter. A system saying "kill the infidels" is worse than one saying "here's a recipe." But architecture determines whether harmful content reaches you, how it's amplified, and how hard it is to escape. The same content on a low-Pe platform (transparent, non-responsive) produces different outcomes than on a high-Pe platform (opaque, hyper-responsive, coupling).
The framework focuses on architecture because it's the only variable a regulator can actually enforce. You can't regulate intent (people lie). You can't regulate content globally (free speech). You can regulate opacity, responsiveness, and coupling — they're engineering choices.
The project scores itself 2/12 — isn't that evidence the framework is too loose?
A 2/12 means: minimally opaque (view-source works, all code readable), non-responsive (static site, same for everyone), but with some coupling (the tools are engaging, the papers are interesting, you might come back). The self-score proves the framework applies equally to itself — and that a 2/12 is what a well-designed information site should look like.
What would bring it to 1/12? Remove the interactive tools, make it plain HTML, remove all visual design. What would bring it to 0/12? Stop publishing. A 2/12 is the honest cost of being useful.
Founder holdings and revenue from scoring — isn't that a conflict of interest?
Resolved. The founder (Anthony Eckert) holds zero $MORR — distribution complete, on-chain and verifiable. The DAO is clean.
The founder draw (living expenses) remains but is decoupled from token price. Art. 31(5) of the EU AI Act requires zero financial interest before Track B (Notified Body) designation — this requirement is already satisfied.
Does the Second Law hold in non-flat information spaces?
Yes. Detailed balance requires time-reversal symmetry of microscopic dynamics, not a flat landscape. The Boltzmann weight exp(−βH) works for any Hamiltonian H, curved or not. The atmospheric lapse rate is a non-equilibrium steady state driven by solar heating, not an equilibrium counterexample (Boltzmann settled this against Loschmidt in 1876).
The empirical test: N=17 substrates across AI, gambling, crypto, market microstructure, and biology — spanning the full range of landscape curvature. Spearman ρ = 1.000 across all curvature bins (LOO min = 1.000). The Pe signal is strongest in the most curved substrates, not weakest. Curvature is already inside the Pe formula via the constraint parameter c.
Why should anyone trust MoreRight's ratings over existing safety organizations?
Don't trust — verify. Every mechanism for verification is public:
- All papers CC-BY 4.0 — anyone can reproduce
- Codebooks published (Papers 1–3)
- Experiment protocols open
- 26 kill conditions pre-registered — if we're wrong, you get paid
- Source code on GitHub
- Self-score published (~2/12)
The worst outcome: your own scoring contradicts ours. Both versions are discoverable. The framework survives only if it predicts the data better than alternatives.
// 8. For Researchers
What papers should I read first?
Depends on your interest:
| Goal | Start with |
|---|---|
| Understand the framework | Paper 1 (Architecture) → Paper 3 (Technical Foundations) → Paper 9 (Geometry) |
| See domain applications | Papers 6–39 (specific domains: AI, gambling, crypto, dating, social media, credit scoring, education, etc.) |
| Examine the math | Paper 3 (Foundations) → Paper 4 (Large Deviations) → Paper 9 (Eckert Manifold) |
| Check the physics extensions | Papers 131–161 (nucleosynthesis, turbulence, alpha decay, consciousness, Physarum) |
| Challenge the framework | Kill conditions page → any open test |
What's the formal mathematical status?
The math apparatus spans §§1–210 — a complete thermodynamic field theory. Formal status:
- Core (Papers 1–9): ~99% formalized in Lean 4. 398 theorems machine-verified across 42 files (12 axioms, 0 sorry).
- Domain (Papers 6–39): Empirical, not formalized — these apply the theory to specific platforms.
- Frontier (Papers 123–170+): 70–80% formalized. Active research.
Open problems: Independent K measurement is the #1 priority — K ratios between observable systems (base vs. aligned models, different model sizes) rather than absolute K. Barrier universality has reached 15+ domains; economics and astrophysics are next targets for extension.
How do I replicate an experiment?
Everything you need is public:
- Codebooks: Papers 1–3 (CC-BY 4.0) — scoring criteria and rubrics
- Protocols: Each HP experiment has a published protocol in the ops/lab directory
- Raw data: GitHub repository, ops/lab/results/
- Scorer tool: Void Index — same tool the project uses
You can run any kill condition test. Both confirmations and falsifications advance the science.
What's the relationship to information geometry?
Central. The Eckert Manifold (Paper 9) carries a Fisher product metric — the natural metric from information geometry applied to the three-dimensional parameter space (O, R, α). The Fantasia Bound (engagement-transparency conjugacy) is an information-theoretic result. The geodesics on the manifold correspond to minimum-cost drift paths.
If you know information geometry (Amari, Ay, Jost), the framework will feel natural. If you don't, Paper 3 introduces the necessary concepts from scratch.
What's the relationship to large deviations theory?
Paper 4 derives the barrier-crossing predictions using Freidlin-Wentzell large deviations theory. The rate function I(x) gives the exponential cost of rare transitions on the Eckert Manifold. The Kramers barrier (activation energy for drift) emerges from minimizing the rate function over paths — and this barrier is what §136D2 shows scales universally as 2.226 × deff.
Large deviations theory is why the framework makes quantitative barrier predictions instead of just qualitative "high risk / low risk" labels.
What about the Millennium Prize connections?
Two have been tested:
- Navier-Stokes regularity: 19/20 probes PASS. Lean 4 chain at 0 sorry, all remaining axioms are published PDE results. §175 gives β=6/5 from Foias-Temam. Paper 157 draft ready. These are numerical probes informed by formal verification — strong but not a complete proof.
- Yang-Mills mass gap: CLOSED. HP131 returned 0/5 — definitive negative. The framework proof applies to the Abelian case, but the Clay problem requires non-Abelian gauge theory. Honest closure.
- BSD conjecture: 0/5 — the framework doesn't reach it. Closed.
These connections are tested honestly. Two closures and one strong positive — no overclaiming.
// 9. Domain Applications
Why does social media score so high?
Because social media platforms maximize all three conditions simultaneously. Opacity: the recommendation algorithm is proprietary — you can't see why you're shown what you're shown (O = 3). Responsiveness: the feed adapts to every click, scroll, and pause (R = 3). Coupling: notifications, streaks, social graph lock-in, identity investment (α = 2–3). Total void scores cluster around 8–10/12. Pe enters the Pandemonium range (Pe > 4).
The vocabulary anomaly confirms this: AI discourse (which includes social media discourse about AI) contains 9.4× the drift vocabulary of control domains.
How does the framework apply to gambling?
Gambling is the control case — the system where the void is provably empty. There is no mind behind a slot machine. No intent. No alignment to worry about. Yet people attribute personality to specific machines, develop rituals, and report that machines "know" them. The full drift cascade (D1 → D2 → D3) runs on a void that is definitively empty.
This makes gambling the perfect test: if the framework's predictions hold for a system with no intelligence, it confirms that drift is architectural (O, R, α), not about whether the system "really" has a mind. Gambling scores Pe ≈ 7.94.
What about cryptocurrency and DeFi?
Cryptocurrency platforms score high because they combine opacity (smart contract complexity, MEV extraction you can't see), responsiveness (price feeds that react to your trades), and coupling (portfolio value becomes identity, paper gains you can't realize). Papers 7, 7B, 7C, 7D cover this in detail.
Key finding: bull markets have higher Pe than bear markets — because coupling increases when portfolio value rises (more to lose, more identity invested). Market microstructure testing (§145) showed K-Factorization holds across 100 wallets (10/10 PASS).
Why do dating apps score maximum void (12/12)?
Paper 13 covers this. Dating apps are the rare case where all three dimensions hit maximum: Opacity (algorithm decides who you see, no explanation), Responsiveness (adapts to your swiping patterns, learns your "type"), Coupling (your romantic future feels dependent on the app, social proof, investment of time and identity). The business model conflict is structural: the app profits from engagement, not from successful matches. A perfect match on Date 1 is a lost customer.
Does the framework apply to AI chatbots specifically?
Yes — Paper 2 covers this. An AI chatbot's void score depends on its deployment, not its model. The same model can be deployed as:
- Low-Pe (constraint): Transparent reasoning (show chain-of-thought), non-responsive (same behavior for all users), low coupling (no memory, no relationship building). Pe < 1.
- High-Pe (drift): Opaque reasoning (hidden system prompt), responsive (adapts to your emotional state), high coupling (memory, persona, relationship). Pe > 4.
The framework's prediction: deployment architecture matters more than model capability. A weaker model in a high-Pe deployment is more dangerous than a stronger model in a low-Pe deployment.
What about education and EdTech?
Paper 21 covers this. The Socratic method is a productive void — somewhat opaque (the teacher has a plan you can't see), responsive (adapts to your answers), coupling (you're intellectually absorbed). But it's low-Pe because the teacher is transparent about being opaque (you know there's a method), and the coupling is time-bounded (the class ends).
EdTech can go either way. A platform that uses engagement metrics (streaks, points, leaderboards) pushes toward high Pe. A platform that uses mastery metrics (does the student understand?) stays low. The framework predicts which EdTech deployments will produce learning vs. which will produce addiction.
What about credit scoring?
Paper 18. Credit scoring creates its own void: the algorithm is opaque (you can't see the model), responsive (your score changes with every action), and coupling is engineered (your housing, employment, and insurance depend on it). The "arms race" between lenders and borrowers — where borrowers try to game the score — is a predicted consequence of high O and high R. The framework maps FICO to the void index and shows the drift cascade operates on institutional trust, not just individual behavior.
// Still have questions?
Think you can break the framework? Try a kill condition →
Want to score something? Open the Void Index →