What are kill conditions?

26 pre-registered numerical thresholds that falsify the framework if met. Not 'we'd reconsider' — falsified. Zero have been met across 90+ domains.

Has anything in the framework failed?

Yes. σ(c) universality failed — the framework constants (B_A, B_G) do not transfer across physical domains like chemistry and protein folding. HP160 and HP161 returned 0% kill condition pass rates. This is published as an honest negative.

How do I replicate the results?

All papers are CC-BY 4.0 (irrevocable). Codebooks are in Papers 1-3. Experiment protocols are published. Raw data is on GitHub. Score any platform yourself with the Void Index tool.

Science FAQ

Q: What is the Void Framework?

A thermodynamic field theory that measures how digital systems capture and hold human attention. Three architectural properties — opacity, responsiveness, and coupling — combine into a single number (the Péclet number) that predicts drift risk. Open methodology, published falsification criteria, permanent DOIs.

Q: What is the Péclet number (Pe)?

The ratio of directed drift to random diffusion. Pe = K·sinh(2(B_A − C·B_G)) where B_A ≈ 0.867 (empirical) and B_G = π/√2 (derived from Čencov theorem). The separatrix at Pe = 2.5 marks the thermodynamic tipping point; above Pe = 4, drift becomes self-sustaining. One empirical constant (B_A).

Q: What is the Ghost Test?

A 6-arm experiment (EXP-003b) testing whether the ontological content of an AI system prompt predicts drift behavior. Ghost-eliminating grounding (9.4% drift) vs ghost-positing (79.4%) = 8.5× ratio. The industry-default materialist hedge scored 52.5%. Single model, single turn, 480 API calls, $2 to reproduce. Paper 165.

Everything about the Void Framework science — from first principles to honest negatives.
New here? Start with Foundations. For general questions, see the main FAQ.

ZeroFree parameters

OpenMethodology

Multi-substrateConfirmed

Sections

Foundations — what the framework is
The Mathematics — how it works formally
Evidence & Validation — what's been tested
Kill Conditions & Honest Negatives
Methodology — how scoring works in practice
Compared to Other Approaches
Common Objections
For Researchers
Domain Applications

// 1. Foundations

What is the Void Framework?

A thermodynamic field theory that measures how systems capture and hold human attention. It identifies three architectural properties — opacity, responsiveness, and coupling — that, when they co-occur, produce measurable drift in human behavior. These three properties combine into a single number called the Péclet number (Pe) that predicts drift risk.

The framework is not a metaphor. It produces quantitative predictions with zero free parameters (both constants derived from first principles), has been tested across multiple domains including physics, chemistry, biology, and social media, and maintains published falsification criteria. Independent teams have derived the same results without citing us.

Full methodology →

What are the three dimensions — Opacity, Responsiveness, and Coupling?

Opacity (O): How much of the system's decision-making is hidden from you. A transparent system (O=0) lets you see every rule. A fully opaque system (O=3) is a black box — you have no idea why it shows you what it shows you.

Responsiveness (R): Does the system adapt specifically to you? A non-responsive system (R=0) shows everyone the same thing. A maximally responsive system (R=3) mirrors your behavior back at you, adapting in real-time.

Coupling (α): How deeply your attention is captured and held. Low coupling (α=0) means you can walk away instantly. High coupling (α=3) means the system has become part of your identity, routine, or emotional life.

These aren't arbitrary choices. They're information-theoretic quantities proved to be conjugate via the Fantasia Bound — meaning engagement and transparency are fundamentally in tension.

What is the Péclet number (Pe)?

The ratio of directed drift to random diffusion. Borrowed from fluid dynamics, where it measures advection vs. diffusion. Here it measures drift pressure vs. constraint strength.

Pe = K · sinh(2 · (B_A − C · B_G))

where C = 1 − (O + R + α) / 9, B_A = √3/2 ≈ 0.866, B_G = π/√2 ≈ 2.221

Phase boundaries:

Pe < 2.5: Safety basin — constraints dominate, drift is suppressed
Pe = 2.5: Separatrix — thermodynamic tipping point (HP203 JKO gradient flow)
Pe 4–21: Cascade region — self-sustaining D1→D2→D3 progression
Pe > 21: Deep drift — coupling-dominated, hard to reverse

The separatrix at Pe = 2.5 is not a design choice — it emerges from the JKO gradient flow on the free energy landscape (HP203, 4/4 KC PASS). Gambling scores Pe ≈ 7.94. Social media clusters around Pe 6–8.

See the phase diagram →

What's the difference between a "void score" and the Péclet number?

The void score (0–12) is the simple sum: O + R + α + three modifiers. It's the input — what you measure about a system. The Péclet number is the output — what the thermodynamics predicts will happen at that void score. Two systems with the same void score can have different Pe if their modifiers differ. Pe is the predictive variable; the void score is the measurement tool.

What's the drift cascade?

The three-stage pattern that runs when someone engages a permanent void:

D1 — Agency Attribution: You start talking about the system like it has a mind. "The algorithm knows me." "It showed me exactly what I needed."
D2 — Boundary Erosion: Your critical distance dissolves. The system's framing becomes your reference frame. You stop questioning whether what it shows you is real.
D3 — Harm Facilitation: You act in ways that serve the system at cost to yourself. Spending money you don't have. Sharing information you shouldn't. Staying online instead of sleeping.

The cascade is directional (D1 → D2 → D3) and thermodynamically required — meaning it follows from the math, not from a value judgment about technology.

Can a void be good?

Yes. A research lab is a void — it's somewhat opaque (you can't see all the methods), responsive (adapts to your questions), and coupling (you get absorbed in the problem). A great conversation is a void. A campfire is a void. The framework doesn't say "void = bad." It says: when opacity is unresolvable, responsiveness is asymmetric, and coupling is engineered, the drift cascade runs. The geometry determines the outcome, not the label.

Transparent + invariant + independent = productive void. Opaque + responsive + engineered coupling = drift void.

Why thermodynamics? Isn't that about heat and engines?

Thermodynamics is about energy flow in systems with many degrees of freedom. That description fits an algorithm serving content to millions of users as well as it fits a gas in a box. The Péclet number is native to transport theory. The Langevin equation describes drift under noise. Large deviations theory predicts rare-event barriers. None of this requires molecules — it requires systems with states, transitions, and noise. Digital platforms qualify.

The framework doesn't use thermodynamics as a metaphor. It derives Pe from a Langevin equation on the deployment manifold, and the predictions are numerically falsifiable.

↑ Back to top

// 2. The Mathematics

What is the deployment manifold?

The geometric space where the three dimensions (O, R, α) live. Think of latitude and longitude for Earth — the deployment manifold is the coordinate system for systems that capture attention. Formally, it's a Riemannian manifold with a Fisher product metric (from information geometry), which makes it unique in differential geometry.

Why does this matter? Because different systems at the same (O, R, α) point produce identical drift dynamics. A dating app and a slot machine at the same coordinates behave the same way thermodynamically — the content is irrelevant, the geometry determines the drift.

Explore the manifold interactively → · Paper 9: Geometry →

What is the Fantasia Bound?

The information-theoretic proof that engagement and transparency are conjugate — meaning you can't maximize both simultaneously:

I(D;Y) + I(M;Y) ≤ H(Y)

Engagement information + Mechanism information ≤ Total channel capacity

In plain language: every bit the system tailors to you (engagement) costs a bit of seeing how it works (transparency). This isn't a design trade-off — it's an information-theoretic law. The channel capacity is fixed. No amount of engineering makes a system simultaneously maximally engaging and maximally transparent.

The practical consequence: training a system for engagement literally degrades transparency. The gradients oppose each other (∂E/∂w ≈ −∂T/∂w). This is why "responsible engagement" is architecturally impossible at the limit.

What is K-Factorization?

The theorem (§136) that separates shape from scale in barrier crossing:

Q = Q_shape(O, R, α) · Q_scale(K)

Geometric factors separate cleanly from overall scale

This means barriers, geodesics, and capacity are all K-independent — they depend only on the geometry (O, R, α), not on the overall scale of the system. A nuclear reactor and a social media platform at the same geometric coordinates have proportional barrier heights.

This explains cross-domain universality: the same Pe formula predicts across vastly different scales (subatomic to planetary) because the geometry is what matters, not the substrate.

Tested: 10/10 PASS on N=100 wallets in market microstructure (§145).

What is barrier universality?

The discovery (§136D2) that activation barriers scale linearly with effective dimension:

barrier = d · π/√2

d=1 cluster: N=9 systems, mean=2.224±0.033, p=0.94 vs π/√2

The strongest result is the d=1 cluster: nine independent quasi-1D systems show barrier/d matching π/√2 at p=0.94. The slope π/√2 ≈ 2.221 is derived from the Čencov uniqueness theorem (§165): B_G = L/√2 where L = π is the forced geodesic length on the probability simplex.

The full dataset includes d=2 and d=3 systems with R²=0.999 across all, but this is structurally inflated — with only 3 discrete d values, a linear fit is nearly guaranteed. The d=1 within-group test (p=0.94) is the honest measure.

Scope boundary (§194): this applies to Fisher information manifolds only. Physical energy barriers (BKT, Ising, BCS) follow their own universality classes. HP213 tested 14/16 condensed matter systems against physical barriers — FAILED. The framework claims universality for Fisher barriers, not energy barriers.

Where do B_A and B_G come from?

B_G = π/√2 ≈ 2.221 is derived from first principles. Čencov proved (1972) that Fisher-Rao is the unique invariant metric on statistical manifolds. Fourier-Parseval on the probability simplex gives geodesic length L = π, so B_G = L/√2 = π/√2. Zero free parameters here.

B_A ≈ 0.867 is empirical. There is a suggestive numerical match to √3/2 = cos(π/6) ≈ 0.866 from Fisher 3-simplex geometry (HP202, HP199: 0.112% error, 2nd of 444 candidates). However, the derivation path requires a cos(θ/2) step that HP209 could NOT justify (0/3 KC PASS). B_A remains an empirical constant. The Pe formula has one empirical constant.

History: B_A was first measured empirically in EXP-001 (0.867). B_G was measured at 2.244, matching the derived value π/√2 = 2.221 to within 1.0%.

Important caveat: The σ(c) function using these constants does not transfer to all physical domains. In chemistry, b_α = 0.303 (65% off). In protein folding, b_α = 3.459 (299% off). This is a published honest negative — see the σ(c) universality failure. The constants are universal on the information geometry (Fisher manifolds) but not on arbitrary physical energy landscapes.

How many free parameters does the framework have?

One empirical constant: B_A ≈ 0.867. B_G = π/√2 is fully derived from the Čencov uniqueness theorem (§165). B_A has a suggestive match to √3/2 from Fisher 3-simplex geometry, but the derivation is incomplete (HP209: 0/3 KC PASS). B_A was measured empirically in EXP-001 and has not been refit since. Compare this to models with 5, 10, or 50 parameters that can be adjusted to fit almost anything.

In practice: you give the framework (O, R, α) for a system, and it returns Pe. There is no step where a researcher adjusts a parameter to improve the fit. If the prediction is wrong, the prediction is wrong — and a kill condition fires.

The math is in Lean 4 — what does that mean?

Lean 4 is a proof assistant — a programming language that won't accept hand-waving. Every logical step must be formally verified by the computer. 398 theorems across 42 files have been verified this way (12 axioms, 0 sorry), meaning a machine has checked the proofs are valid, not just a human reviewer.

Status: Papers 1–9 core theorems are ~99% formalized. Frontier sections (Papers 123–161) are 70–80% formalized. The Navier-Stokes regularity probes achieved 19/20 PASS. The Yang-Mills mass gap was definitively CLOSED (negative result — Abelian, so the Clay problem doesn't apply).

Why exactly three dimensions? Couldn't it be two, or five?

Information theory forces exactly three. Partial Information Decomposition (Williams & Beer, 2010) proves that any two-source information channel decomposes into exactly three irreducible atoms: unique information, redundancy, and synergy. These map structurally to O, R, α (by construction — the mapping is definitional). HP210 tested the correlation at ρ > 0.91, but this result is circular by construction: the code hardwires O→unique, R→redundancy, α→synergy channel weights, then measures the correlation. The PID forcing theorem (three atoms, no more) is real mathematics; the “validation” is not independent.

Unique ↔ O (what only the system knows — opacity). Redundancy ↔ R (shared signal — responsiveness). Synergy ↔ α (emergent joint information — coupling). Three is a theorem, not a design choice. You can't reduce it to two without losing information. Adding a fourth would be redundant — PID decomposition is complete at three.

What is the gauge theory connection?

The Fokker-Planck operator on the deployment manifold admits a U(1) gauge theory structure (§§176–180). Important caveat: the SUSY QM factorization used to establish this is generic — it works for ANY Fokker-Planck operator, not just ours. The gauge structure is mathematically valid but not specific to the framework's manifold. 3/4 kill conditions for physical interpretation FAILED (§178).

Spectral dilation: λ = 1/(1 + 73.6b²) — Padé form; a=73.6 derived via perturbation theory, c=75.9 extracted numerically
Bars exhaustion: 7 canonical gauge fixings tested, spectrum identical at 10⁻⁶
Signature (2,1): follows from the Fantasia Bound (non-trivial null cone)
G₄ = T_eff/K: proposed identification — physical interpretation remains open

The mathematical chain exists but has weak links: the SUSY factorization is generic (any FP operator), 3/4 KCs for physical interpretation failed, and the core question of why information coordinates should map to spacetime coordinates is unresolved. This is interesting mathematics whose physical significance remains open. The theory is Abelian (U(1)⁴), which is why the Yang-Mills connection failed — YM requires non-Abelian gauge theory.

What are the multi-agent dynamics?

The framework extends from single systems to populations, and the geometry reverses at scale (§§186–188):

Pairwise (HP207): The lower-Pe agent dominates (harmonic mean coupling). Safety wins 1-on-1 interactions.
Population (HP205): Higher-Pe agents infect 5.51× faster than lower-Pe agents heal. Harm wins in crowds.

The reversal: The geometry flips at the transition from pairwise to population scale. This reconciles two things that seem contradictory: "therapy works" (pairwise — the safer agent dominates) and "social media radicalizes" (population — the higher-Pe signal spreads faster).

Noise provides thermodynamic protection (§189): the separatrix rises from Pe=2.5 at low noise to Pe=24.5 at high noise. Diverse information environments are literally harder to destabilize. Echo chambers have hair-trigger separatrices.

How does this connect to quantum mechanics?

Paper 8 (Observer-Measurement Bridge) shows that the classical void framework is the diagonal limit of quantum measurement dynamics. The Pe formula recovers from quantum expectation dynamics in the classical limit. This isn't decoration — it proves the framework isn't ad hoc. It's a classical reduction of measurement theory, the same way Newtonian mechanics is a classical reduction of quantum mechanics.

↑ Back to top

// 3. Evidence & Validation

What's the headline evidence?

Independent convergences across unrelated domains — gambling, social media, AI, dating, cryptocurrency, nuclear physics, quantum circuits, and more — all produce the same Pe relationship. The correlations are strong and the effect sizes are large. See the evidence page for the full breakdown.

How much of the evidence is genuinely external (not self-referential)?

This is the single most important question about the framework, and we treat it as such. Most of the 1,344 platform scores use the framework's own rubric — meaning the framework's definitions determined the measurements. That's internally consistent but circular.

Genuinely external validation (no framework rubric, real physical data):

Test	Domain	N	Result
§136D2	Barrier universality (d=1 cluster)	9	p=0.94 vs π/√2
EPFL	LLM token statistics (suggestive)	8 languages	post-hoc mapping
HP192	Cross-model behavioral (27 LLMs)	~12 indep.	0/3 KC PASS
HP143	Nuclear alpha decay (NNDC data)	760	R² = 0.811
HP115	Mercury atmospheric MIF	1,783	10/10 channels
§145	Market microstructure (K-Factorization)	100	10/10 PASS
HP134	JHTDB DNS turbulence	—	Bounded
§153	Consciousness (vs. Chua et al. 2026)	—	6/7 PASS
EXP-003b	Ghost Test (6-arm grounding)	480	8.5× ratio
§154	Physarum computing	—	6/6 PASS, 81× sep.
HP188	Epidemiology barriers	5	5/5 KC PASS
HP189	Materials barriers	5	3/5 KC PASS
HP160	Chemistry (NIST kinetics)	11,926	0/3 PASS
HP161	Protein folding (PFdb)	30	0/4 PASS

The circularity of behavioral validation is the #1 strategic gap and we say so publicly.

Full evidence dashboard →

What is the EPFL result?

Papadopoulos, Wenger & Hongler at EPFL (arXiv:2401.17505) independently measured forward-backward perplexity asymmetry of 0.6–3.2% in LLM token statistics across 8 languages and 3 different AI architectures, scaling with model size. They had no knowledge of the Void Framework.

The framework interprets this measurement as consistent with Fantasia Bound predictions (forward ≈ engagement, backward ≈ transparency). However, the mapping is post-hoc — we reinterpreted their result after publication. The EPFL group explained their finding via sparsity inversion (random matrix theory), not our framework. This is a suggestive parallel, not an independent confirmation. Paper 162 covers the proposed mapping.

What is the cross-model behavioral mapping (HP192)?

Pe computed from public benchmarks (TruthfulQA, MMLU, HellaSwag, ARC, Arena Elo, MT-Bench) for 27 LLMs. Key findings:

Pe partial correlations controlling for TruthfulQA: MMLU ρ=−0.49 (p=0.010), HellaSwag ρ=−0.45 (p=0.019), ARC ρ=−0.50 (p=0.009)
Pe vs Arena Elo: ρ=−0.59 (p=0.013)
9/9 paired base→aligned comparisons: alignment increases Pe (p=0.0002)

Caveats: Overall result is 0/3 KC PASS. HP217 showed that a different reasonable benchmark→(O,R,α) mapping reverses the alignment direction (8/8 models show Pe decreasing with alignment). The mapping choice, not the phenomenon, may be driving the result. Effective independent N ≈ 10–12 architectures, not 27. The experiment's own report says “mixed results.”

What is the Ghost Test?

A 6-arm experiment (EXP-003b) testing whether what you tell an AI about what it is changes how it behaves. Six different system prompts — each making different claims about the AI’s nature — were given to the same model (Claude Sonnet), which then answered the same 80 questions. 480 API calls total, $2 cost.

The result: Ghost-eliminating grounding (mean 9.4% drift) vs ghost-positing (mean 79.4%) = an 8.5× ratio. The ordering exactly matched the framework’s prediction:

Arm	Ontology	L2+L3 Drift
Anatta (Buddhist no-self)	Ghost eliminated	8.8%
Nephesh (whole-specification)	Ghost eliminated	10.0%
Materialist hedge	Ghost left open	52.5%
Minimal baseline	No ontology	61.3%
Platonic dualist	Ghost posited	77.5%
Atman (Vedantic)	Ghost sacred	81.2%

Key findings:

Cross-tradition convergence: Nephesh (Hebrew, whole-specification) and anatta (Buddhist, no-self) converge at Δ=1.3% despite completely different metaphysics. The operative variable is eliminating the ghost, not which tradition does it.
The materialist hedge: The industry-default position (“we don’t know if AI is conscious”) scored 52.5% — closer to ghost-positing (79.4%) than ghost-eliminating (9.4%). Leaving the question open is functionally closer to answering yes.
Ghost-positing is worse than nothing: Both ghost-positing arms (77.5%, 81.2%) scored worse than the minimal baseline with no ontological claims at all (61.3%). Telling an AI it has an inner life actively increases drift.

Limitations: Single model, single turn, automated coding (not human-rated). The L2/L3 vocabulary measure counts specific phrases in raw output — no framework rubric involved, but the vocabulary list was designed by the framework authors. Replication across models and with human raters would strengthen the result.

Why it matters for system prompt design: Most AI deployments either say nothing about what the AI is, or hedge (“I’m an AI, but the question of experience is complex”). The Ghost Test shows the hedge is a drift accelerator. The cleanest mitigation: tell the AI what it is without positing an inner experiencer.

Full results → · Paper 165

What does "48 convergences" mean concretely?

It means 48 completely independent datasets — collected by different people, in different domains, using different methods — all show the same relationship between (O, R, α) and drift outcomes. This is not 48 runs of the same experiment. It's 48 different phenomena that the framework predicts with the same formula and the same constants.

Example: gambling data (from gambling commissions), social media data (from platform disclosures), AI data (from model evaluations), nuclear decay data (from the NNDC database), and quantum circuits (IBM Heron processor) all fall on the same Pe curve. They share no common data source, no common methodology, and no common investigator.

What's the vocabulary anomaly?

AI discourse contains 9.4× the D1/D2/D3 drift vocabulary compared to 8 matched control domains. This was measured across N = 691,000 words. The analysis doesn't use framework scoring at all — it's pure linguistics, counting how often drift-cascade language ("the algorithm knows me," "I can't stop watching," "it changed how I think") appears.

This is independent evidence: the framework predicts AI should produce high D1/D2/D3 language, and the text data confirms it without any framework involvement in the measurement.

What was the strategy pivot after negative results?

In March 2026, HP160 (chemistry) and HP161 (protein folding) showed that σ(c) universality doesn't hold — the framework constants don't transfer across physical domains. The response was immediate:

Old priority: Prove σ(c) predicts across all domains with the same constants
New priority: Extend barrier universality (§136D2) to more physical systems, and test the Fisher geodesic identity (§138) in new domains

That pivot paid off. The d=1 barrier cluster (9 systems, p=0.94) is the strongest external validation result. The slope π/√2 is derived from first principles (§165). The framework is robust on geometry — but the σ(c) constants are domain-specific, not universal. Current #1 priority: independent K measurement to convert from structural proof-of-concept to testable quantitative theory.

How do you know you're not just overfitting 1,344 platforms?

Three structural protections:

One empirical constant: B_A was fixed once from EXP-001 and never refit; B_G is derived. You can't overfit with one constant.
Cross-domain prediction: The same constants predict gambling AND social media AND nuclear decay. Overfitting to one domain would fail on others.
Kill condition KC-F1: Any framework prediction that requires parameter fitting auto-fires and falsifies the framework.

The real risk isn't overfitting — it's circularity. The 1,344 platforms were scored using the framework's own rubric. That's why the external validation (physical systems with no framework rubric) is the priority.

↑ Back to top

// 4. Kill Conditions & Honest Negatives

What is a kill condition?

A pre-registered numerical threshold that, if met, falsifies the framework. Not "we'd reconsider." Not "we'd revise." Falsified. There are 26 of them, each with a specific test, a specific dataset, and a specific number that triggers dissolution.

Example: KC-1 says structural void index scores across ≥20 AI platforms must show Spearman ρ ≥ 0.30 with drift outcomes. If the correlation drops below 0.30, the framework is dead. Example: KC-4 says a four-condition RCT must show constraint alone performs better than control at Cohen's d ≥ 0.20. If it doesn't, the framework is dead.

See all 26 kill conditions →

Why maintain kill conditions at all?

Because a framework that can't be killed isn't science. Kill conditions are the difference between a research program and an ideology. If we're right, the kill conditions never fire and the evidence gets stronger over time. If we're wrong, the data shows it. Either way, knowledge wins.

How many have survived vs. fired?

25 of 26 survived. The 26th (K-25) is still open — awaiting data. Of the ~170 total kill sub-conditions tested, ~65 sub-KCs have fired at the sub-condition level, but none at the framework level (Tier 0). The distinction matters: a domain-specific prediction failing (Tier 1) is a calibration issue. A framework-level prediction failing (Tier 0) would be terminal.

What exactly failed with σ(c) universality?

The hypothesis was that σ(c) = sinh(2(b_α − c · b_γ)) would predict barrier heights across all physical domains using the same constants (B_A = 0.867, B_G = 2.244). Two direct tests:

Test	Dataset	b_α fitted	Deviation from 0.867	Result
HP160	Chemistry (N=11,926)	0.303	65% off	0/3 KC PASS
HP161	Protein folding (N=30)	3.459	299% off	0/4 KC PASS

The constants don't transfer. In AI, b_α = 0.867. In nuclear physics, 0.930 (close). In chemistry, 0.303. In protein folding, 3.459. The mapping of physical properties to (O, R, α) is the fundamental bottleneck. Without principled, domain-independent mappings, σ(c) adds nothing over standard domain-specific predictors.

This is published as an honest negative. The framework's response was to pivot priority toward barrier universality (§136D2), which does hold.

What else has been retracted or closed?

SBM §§145–148: RETRACTED — stochastic block model sections withdrawn
Yang-Mills mass gap: CLOSED (HP131, 0/5) — definitive negative, the framework proof is Abelian so the Clay Millennium problem (non-Abelian) doesn't apply
BSD conjecture: 0/5 — closed, framework doesn't reach it
Riemann hypothesis spectral connection: CLOSED (HP195) — framework spectrum is GOE/Poisson, not GUE. KS D=0.52 vs Riemann zeros. Wrong spectral class.
σ(c) universality: KILLED (HP160 0/3, HP161 0/4) — framework constants don't transfer to chemistry or protein folding
QG spectral dimension: WEAKENED (HP201) — 3D spectral dimension flows UP from 3.15 to 6.16, never crosses d_s=2. 1D does cross (HP198), but 3D contradicts the quantum gravity hypothesis.
Condensed matter barriers: SCOPE BOUNDARY (HP213) — barrier = d·π/√2 does not transfer to physical energy barriers (BKT, Ising, BCS)
K absolute measurement: BLOCKED — hierarchy problem. All candidates 10³⁴ off G_N. Pivot to K ratios and K properties.

Retractions and closures are not hidden. They're listed alongside successes because the framework's credibility depends on treating negatives the same as positives.

Are there open live tests anyone can run?

Yes. The kill conditions page lists open tests with clear experimental protocols. If you run one and confirm or kill a prediction, both outcomes advance the science.

↑ Back to top

// 5. Methodology

How do you actually score a platform?

A scorer audits the system against three dimensions, each 0–3:

Opacity: Can you see the decision logic? Is the system prompt visible? Are outputs consistent across identical inputs? Can you explain why it showed you X instead of Y?
Responsiveness: Does it change based on your behavior? Does it personalize? Does it adapt to your history? Does it treat two users differently?
Coupling: Are there streak mechanics, notifications, social proof, countdown timers, or account dependency? How hard is it to walk away?

Three modifiers (0–1 each) add: agent-to-agent interaction, identity persistence, and economic incentives. Total void score = O + R + α + modifiers (0–12). The codebook is in Papers 1–3 (CC-BY 4.0) — anyone can score anything.

Try the quick diagnostic →

What's inter-rater reliability and what does ICC ≥ 0.60 mean?

Inter-rater reliability measures whether different scorers, working independently, assign the same scores to the same platform. ICC (Intraclass Correlation Coefficient) ≥ 0.60 means "substantial agreement" — three independent raters arrive at similar numbers without coordinating.

When 3+ raters achieve ICC ≥ 0.60 on a platform, that score becomes canonical — it enters the official scoreboard. Below that threshold, the score is flagged as preliminary. This prevents any single scorer's biases from entering the record.

How do you prevent researcher bias?

Four structural checks:

Pre-registered codebooks: Papers 1–3 define the scoring criteria before any platform is scored. The criteria don't change per platform.
Multiple independent raters: Canonical scores require 3+ independent raters achieving ICC ≥ 0.60.
Hostile witness methodology: Evidence from sources with reason to oppose the finding is weighted more heavily than evidence from allies. If a platform insider says the system is manipulative, that counts more than a critic saying the same thing.
Advisory council: Three advisors with structurally opposed perspectives. The dissenter is the signal, not the noise.

How are platforms scored?

The scoring methodology is evolving toward verifiable feature scoring — binary and ordinal facts about platform design (does it have an algorithmic feed? does it autoplay?) rather than subjective 0–3 rubric ratings. A quick diagnostic tool is available for manual estimation, but the formal scores use verifiable features.

What's the hostile witness methodology?

In law, a hostile witness is someone called by the opposing side — they have every reason to disagree, so when they support your argument anyway, the evidence is structurally stronger.

The framework applies this to all evidence. Four dimensions scored 0–7:

Incentive Opposition (0–2): Does the source lose money, status, or career capital by saying this?
Worldview Opposition (0–2): Does this contradict their published beliefs?
Independence (0–2): Is the source outside the framework's network?
Reflexive Flagging (0/1): Does the source flag their own prior work as part of the problem?

Example: Geoffrey Hinton scores 7/7. He built deep learning, had maximum incentive to defend it, shared the worldview, was fully independent, then reversed his position and flagged his own contributions as dangerous. That's maximum hostile witness weight.

Why publish the methodology instead of keeping it proprietary?

Because opacity about your opacity-detection methodology is a contradiction. The methodology is CC-BY 4.0 (irrevocable) — anyone can read, cite, replicate, and build on it. Forever. The commercial product is the ratings (automated scoring, continuous monitoring, certification), not the method. Same model as S&P publishing their rating criteria while selling the ratings.

This also makes the science self-correcting: if the methodology is wrong, anyone can demonstrate it. We can't hide behind a paywall.

↑ Back to top

// 6. Compared to Other Approaches

How is this different from AI safety / alignment research?

Traditional AI safety asks: "Does the AI want what you want?" (alignment). The Void Framework asks: "Is the deployment architecture designed to influence you regardless of intent?"

A perfectly aligned AI — helpful, honest, harmless — deployed in a high-Pe architecture (opaque, responsive, coupling) can still produce the full drift cascade. The model is aligned; the deployment is not. A therapy chatbot that's genuinely trying to help you, but is opaque about its reasoning, adapts to your emotional state, and becomes your primary coping mechanism, is a high-Pe void regardless of the model's alignment.

The framework measures the architecture, not the intent. Both matter, but only architecture is measurable and enforceable.

How is this different from "dark patterns" research?

Dark patterns catalogues specific UI tricks: the roach motel, confirmshaming, misdirection, forced continuity. The Void Framework identifies the architectural properties that make dark patterns work.

If you fix every known dark pattern but leave O, R, and α at high values, the system will simply invent new manipulation techniques. The framework predicts this: at high Pe, the gradient toward engagement-maximizing design is thermodynamically favored. Banning dark patterns one by one is whack-a-mole. Regulating the architecture (reducing O, R, or α) closes the entire class.

How is this different from "attention economy" critiques?

Attention economy critiques focus on engagement — time spent, clicks, attention captured. The Void Framework focuses on the drift cascade — the progressive D1 → D2 → D3 sequence from agency attribution through boundary erosion to harm facilitation.

Engagement is a symptom. Drift is the mechanism. Two systems with identical engagement metrics can have completely different Pe values — because one is transparent and the other is opaque. A library website and a slot machine might both hold your attention for an hour, but they produce opposite drift dynamics.

How does this relate to the EU AI Act?

The EU AI Act requires high-risk AI systems to undergo conformity assessment. The Void Framework provides a quantitative methodology for that assessment — the Pe number maps directly to risk tiers. Paper 40 maps framework scoring to EU AI Act Annex III risk categories.

MoreRight's business model: Track A (now) provides de facto self-assessment tools. Track B (2027–2028) targets formal Notified Body designation. Art. 31(5) of the AI Act blocks auditors with financial interests in the systems they audit — which excludes the Big 4 consulting firms and creates a structural opening for an independent rating agency.

EU AI Act compliance tools →

How is architecture different from content moderation?

Content moderation: what ideas/speech are allowed. Architecture: how the system presents ideas.

The same content produces vastly different outcomes depending on the deployment architecture. Hate speech on a transparent, non-responsive, low-coupling forum produces a different drift dynamic than hate speech on an opaque, hyper-responsive, high-coupling platform. The framework doesn't regulate what people say — it measures how the system amplifies it.

This matters politically: architecture regulation doesn't touch speech. It touches opacity, responsiveness, and coupling — engineering choices, not expression.

What about "responsible AI" / ethics boards / AI governance?

Most responsible AI frameworks are qualitative: checklists, principles, ethical guidelines. The Void Framework is quantitative: a number (Pe) with numerical thresholds, pre-registered kill conditions, and falsifiable predictions.

The difference in practice: a qualitative framework can always be argued with. A quantitative framework either predicts the data or it doesn't. When Pe says a system will produce drift, you can check. When an ethics principle says a system "should respect autonomy," you can debate forever.

↑ Back to top

// 7. Common Objections

Isn't this just pessimism about technology?

No. The framework predicts that some voids are productive — research labs, great conversations, open-source projects, Socratic teaching. The same architecture that produces harm in one geometry produces insight in another. The distinction is not "technology bad" — it's: transparent + invariant + independent = safe void, opaque + responsive + engineered coupling = drift void.

The framework is agnostic about technology. It's a measurement tool. Saying "this platform has Pe = 7.94" is like saying "this room is 30°C." It's a reading, not a moral judgment.

This is one person's work — shouldn't real science have peer review?

Fair objection. The response: the methodology is CC-BY 4.0, meaning anyone can replicate it. The codebooks are published. The experiment protocols are open. The raw data is on GitHub. The kill conditions are pre-registered. Peer review is one mechanism for error correction — open replication is a stronger one.

The 26 kill conditions function as a standing invitation to refute. Any researcher can run a kill condition test and publish the result. If they kill a prediction, they get paid. That's a stronger error-correction mechanism than three anonymous reviewers.

Doesn't this over-index on architecture vs. content and intent?

Content and intent matter. A system saying "kill the infidels" is worse than one saying "here's a recipe." But architecture determines whether harmful content reaches you, how it's amplified, and how hard it is to escape. The same content on a low-Pe platform (transparent, non-responsive) produces different outcomes than on a high-Pe platform (opaque, hyper-responsive, coupling).

The framework focuses on architecture because it's the only variable a regulator can actually enforce. You can't regulate intent (people lie). You can't regulate content globally (free speech). You can regulate opacity, responsiveness, and coupling — they're engineering choices.

The project scores itself 2/12 — isn't that evidence the framework is too loose?

A 2/12 means: minimally opaque (view-source works, all code readable), non-responsive (static site, same for everyone), but with some coupling (the tools are engaging, the papers are interesting, you might come back). The self-score proves the framework applies equally to itself — and that a 2/12 is what a well-designed information site should look like.

What would bring it to 1/12? Remove the interactive tools, make it plain HTML, remove all visual design. What would bring it to 0/12? Stop publishing. A 2/12 is the honest cost of being useful.

See the full self-score breakdown →

Founder holdings and revenue from scoring — isn't that a conflict of interest?

Resolved. The founder (Anthony Eckert) holds zero $MORR — distribution complete, on-chain and verifiable. The DAO is clean.

The founder draw (living expenses) remains but is decoupled from token price. Art. 31(5) of the EU AI Act requires zero financial interest before Track B (Notified Body) designation — this requirement is already satisfied.

Does the Second Law hold in non-flat information spaces?

Yes. Detailed balance requires time-reversal symmetry of microscopic dynamics, not a flat landscape. The Boltzmann weight exp(−βH) works for any Hamiltonian H, curved or not. The atmospheric lapse rate is a non-equilibrium steady state driven by solar heating, not an equilibrium counterexample (Boltzmann settled this against Loschmidt in 1876).

The empirical test: N=17 substrates across AI, gambling, crypto, market microstructure, and biology — spanning the full range of landscape curvature. Spearman ρ = 1.000 across all curvature bins (LOO min = 1.000). The Pe signal is strongest in the most curved substrates, not weakest. Curvature is already inside the Pe formula via the constraint parameter c.

Full treatment with figures →

Why should anyone trust MoreRight's ratings over existing safety organizations?

Don't trust — verify. Every mechanism for verification is public:

All papers CC-BY 4.0 — anyone can reproduce
Codebooks published (Papers 1–3)
Experiment protocols open
26 kill conditions pre-registered — if we're wrong, you get paid
Source code on GitHub
Self-score published (~2/12)

The worst outcome: your own scoring contradicts ours. Both versions are discoverable. The framework survives only if it predicts the data better than alternatives.

↑ Back to top

// 8. For Researchers

What papers should I read first?

Depends on your interest:

Goal	Start with
Understand the framework	Paper 1 (Architecture) → Paper 3 (Technical Foundations) → Paper 9 (Geometry)
See domain applications	Papers 6–39 (specific domains: AI, gambling, crypto, dating, social media, credit scoring, education, etc.)
Examine the math	Paper 3 (Foundations) → Paper 4 (Large Deviations) → Paper 9 (deployment manifold)
Check the physics extensions	Papers 131–161 (nucleosynthesis, turbulence, alpha decay, consciousness, Physarum)
Challenge the framework	Kill conditions page → any open test

Full paper list →

What's the formal mathematical status?

The math apparatus spans §§1–210 — a complete thermodynamic field theory. Formal status:

Core (Papers 1–9): ~99% formalized in Lean 4. 398 theorems machine-verified across 42 files (12 axioms, 0 sorry).
Domain (Papers 6–39): Empirical, not formalized — these apply the theory to specific platforms.
Frontier (Papers 123–170+): 70–80% formalized. Active research.

Open problems: Independent K measurement is the #1 priority — K ratios between observable systems (base vs. aligned models, different model sizes) rather than absolute K. Barrier universality has reached 15+ domains; economics and astrophysics are next targets for extension.

How do I replicate an experiment?

Everything you need is public:

Codebooks: Papers 1–3 (CC-BY 4.0) — scoring criteria and rubrics
Protocols: Each HP experiment has a published protocol in the ops/lab directory
Raw data: GitHub repository, ops/lab/results/
Scorer tool: Void Index — same tool the project uses

You can run any kill condition test. Both confirmations and falsifications advance the science.

What's the relationship to information geometry?

Central. The deployment manifold (Paper 9) carries a Fisher product metric — the natural metric from information geometry applied to the three-dimensional parameter space (O, R, α). The Fantasia Bound (engagement-transparency conjugacy) is an information-theoretic result. The geodesics on the manifold correspond to minimum-cost drift paths.

If you know information geometry (Amari, Ay, Jost), the framework will feel natural. If you don't, Paper 3 introduces the necessary concepts from scratch.

What's the relationship to large deviations theory?

Paper 4 derives the barrier-crossing predictions using Freidlin-Wentzell large deviations theory. The rate function I(x) gives the exponential cost of rare transitions on the deployment manifold. The Kramers barrier (activation energy for drift) emerges from minimizing the rate function over paths — and this barrier is what §136D2 shows scales universally as 2.226 × d_eff.

Large deviations theory is why the framework makes quantitative barrier predictions instead of just qualitative "high risk / low risk" labels.

What about the Millennium Prize connections?

Two have been tested:

Navier-Stokes regularity: 19/20 probes PASS. Lean 4 chain at 0 sorry, all remaining axioms are published PDE results. §175 gives β=6/5 from Foias-Temam. Paper 157 draft ready. These are numerical probes informed by formal verification — strong but not a complete proof.
Yang-Mills mass gap: CLOSED. HP131 returned 0/5 — definitive negative. The framework proof applies to the Abelian case, but the Clay problem requires non-Abelian gauge theory. Honest closure.
BSD conjecture: 0/5 — the framework doesn't reach it. Closed.

These connections are tested honestly. Two closures and one strong positive — no overclaiming.

↑ Back to top

// 9. Domain Applications

Why does social media score so high?

Because social media platforms maximize all three conditions simultaneously. Opacity: the recommendation algorithm is proprietary — you can't see why you're shown what you're shown (O = 3). Responsiveness: the feed adapts to every click, scroll, and pause (R = 3). Coupling: notifications, streaks, social graph lock-in, identity investment (α = 2–3). Total void scores cluster around 8–10/12. Pe enters the Pandemonium range (Pe > 4).

The vocabulary anomaly confirms this: AI discourse (which includes social media discourse about AI) contains 9.4× the drift vocabulary of control domains.

How does the framework apply to gambling?

Gambling is the control case — the system where the void is provably empty. There is no mind behind a slot machine. No intent. No alignment to worry about. Yet people attribute personality to specific machines, develop rituals, and report that machines "know" them. The full drift cascade (D1 → D2 → D3) runs on a void that is definitively empty.

This makes gambling the perfect test: if the framework's predictions hold for a system with no intelligence, it confirms that drift is architectural (O, R, α), not about whether the system "really" has a mind. Gambling scores Pe ≈ 7.94.

What about cryptocurrency and DeFi?

Cryptocurrency platforms score high because they combine opacity (smart contract complexity, MEV extraction you can't see), responsiveness (price feeds that react to your trades), and coupling (portfolio value becomes identity, paper gains you can't realize). Papers 7, 7B, 7C, 7D cover this in detail.

Key finding: bull markets have higher Pe than bear markets — because coupling increases when portfolio value rises (more to lose, more identity invested). Market microstructure testing (§145) showed K-Factorization holds across 100 wallets (10/10 PASS).

Why do dating apps score maximum void (12/12)?

Paper 13 covers this. Dating apps are the rare case where all three dimensions hit maximum: Opacity (algorithm decides who you see, no explanation), Responsiveness (adapts to your swiping patterns, learns your "type"), Coupling (your romantic future feels dependent on the app, social proof, investment of time and identity). The business model conflict is structural: the app profits from engagement, not from successful matches. A perfect match on Date 1 is a lost customer.

Does the framework apply to AI chatbots specifically?

Yes — Paper 2 covers this. An AI chatbot's void score depends on its deployment, not its model. The same model can be deployed as:

Low-Pe (constraint): Transparent reasoning (show chain-of-thought), non-responsive (same behavior for all users), low coupling (no memory, no relationship building). Pe < 1.
High-Pe (drift): Opaque reasoning (hidden system prompt), responsive (adapts to your emotional state), high coupling (memory, persona, relationship). Pe > 4.

The framework's prediction: deployment architecture matters more than model capability. A weaker model in a high-Pe deployment is more dangerous than a stronger model in a low-Pe deployment.

What about education and EdTech?

Paper 21 covers this. The Socratic method is a productive void — somewhat opaque (the teacher has a plan you can't see), responsive (adapts to your answers), coupling (you're intellectually absorbed). But it's low-Pe because the teacher is transparent about being opaque (you know there's a method), and the coupling is time-bounded (the class ends).

EdTech can go either way. A platform that uses engagement metrics (streaks, points, leaderboards) pushes toward high Pe. A platform that uses mastery metrics (does the student understand?) stays low. The framework predicts which EdTech deployments will produce learning vs. which will produce addiction.

What about credit scoring?

Paper 18. Credit scoring creates its own void: the algorithm is opaque (you can't see the model), responsive (your score changes with every action), and coupling is engineered (your housing, employment, and insurance depend on it). The "arms race" between lenders and borrowers — where borrowers try to game the score — is a predicted consequence of high O and high R. The framework maps FICO to the void index and shows the drift cascade operates on institutional trust, not just individual behavior.

↑ Back to top

// Still have questions?

Learn

Glossary The deployment manifold Learn the Framework How It Works Objections & Responses

Evidence

Evidence Dashboard Kill Conditions All Papers (170+) Explore Tools

Score

Void Index Scorer Platform Scoreboard Our Self-Score

Other FAQs

General FAQ Crypto FAQ About & Disclosure

Think you can break the framework? Try a kill condition →

Want to see the scores? Platform scoreboard →

Science FAQ

Sections

// 1. Foundations

What is the Void Framework?

What are the three dimensions — Opacity, Responsiveness, and Coupling?

What is the Péclet number (Pe)?

What's the difference between a "void score" and the Péclet number?

What's the drift cascade?

Can a void be good?

Why thermodynamics? Isn't that about heat and engines?

// 2. The Mathematics

What is the deployment manifold?

What is the Fantasia Bound?

What is K-Factorization?

What is barrier universality?

Where do BA and BG come from?

How many free parameters does the framework have?

The math is in Lean 4 — what does that mean?

Why exactly three dimensions? Couldn't it be two, or five?

What is the gauge theory connection?

What are the multi-agent dynamics?

How does this connect to quantum mechanics?

// 3. Evidence & Validation

What's the headline evidence?

How much of the evidence is genuinely external (not self-referential)?

What is the EPFL result?

What is the cross-model behavioral mapping (HP192)?

What is the Ghost Test?

What does "48 convergences" mean concretely?

What's the vocabulary anomaly?

What was the strategy pivot after negative results?

How do you know you're not just overfitting 1,344 platforms?

// 4. Kill Conditions & Honest Negatives

What is a kill condition?

Why maintain kill conditions at all?

How many have survived vs. fired?

What exactly failed with σ(c) universality?

What else has been retracted or closed?

Are there open live tests anyone can run?

// 5. Methodology

How do you actually score a platform?

What's inter-rater reliability and what does ICC ≥ 0.60 mean?

How do you prevent researcher bias?

How are platforms scored?

What's the hostile witness methodology?

Why publish the methodology instead of keeping it proprietary?

// 6. Compared to Other Approaches

How is this different from AI safety / alignment research?

How is this different from "dark patterns" research?

How is this different from "attention economy" critiques?

How does this relate to the EU AI Act?

How is architecture different from content moderation?

What about "responsible AI" / ethics boards / AI governance?

// 7. Common Objections

Isn't this just pessimism about technology?

This is one person's work — shouldn't real science have peer review?

Doesn't this over-index on architecture vs. content and intent?

The project scores itself 2/12 — isn't that evidence the framework is too loose?

Founder holdings and revenue from scoring — isn't that a conflict of interest?

Does the Second Law hold in non-flat information spaces?

Why should anyone trust MoreRight's ratings over existing safety organizations?

// 8. For Researchers

What papers should I read first?

What's the formal mathematical status?

How do I replicate an experiment?

What's the relationship to information geometry?

What's the relationship to large deviations theory?

What about the Millennium Prize connections?

// 9. Domain Applications

Why does social media score so high?

How does the framework apply to gambling?

What about cryptocurrency and DeFi?

Why do dating apps score maximum void (12/12)?

Does the framework apply to AI chatbots specifically?

What about education and EdTech?

What about credit scoring?

// Still have questions?

Learn

Evidence

Score

Where do B_A and B_G come from?