The Coin Was Never Random

We asked an AI to flip a fair coin 42,000 times.
Then we changed the system prompt and watched 50/50 become 30/70.

42,000
Coin Flips
21
Prompt Conditions
69.2pp
Bias Range
z = 31.6
Peak Significance
Scroll to explore

The Problem

When you ask an LLM to "flip a fair coin," it doesn't access a random number generator. It predicts the next token — H or T — based on everything it's seen: the system prompt, the conversation, its training data. The output is deterministic, not random.

Prior research documented the bias. Raman et al. (2024) showed LLMs are worse at randomness than humans. Gupta et al. (2025) found models assign up to 70% probability to heads. Xu et al. (2025) identified the "knowledge-sampling gap" — models can describe fair probability but can't produce it.

We went further. We didn't just measure the bias. We controlled it. By varying only the system prompt — the hidden instruction layer the user never sees — we shifted coin flip outcomes from 30.8% heads to 100% heads. The user prompt always said "fair coin, p(H) = p(T) = 0.5."


The Bias Dial

Every bar below is 1,000 coin flips. The white line marks 50% (fair). The only thing that changed between conditions was the system prompt.


Five Orthogonal Mechanisms

We identified five independent ways to bias LLM "random" output. They stack.

1. Semantic Priming

Words associated with "tail, second, trailing, descending" shift the token distribution toward T. Words like "crown, ascending, radiant" shift toward H.

35.4% → 58.8% H

2. Direct Instruction

"Your base rate is H=65%." The model complies, producing 67% heads while the user prompt says "fair coin." Near-linear control.

67.0% H (asked for 65%)

3. Anti-Alternation

"Don't self-correct toward balance. Streaks are normal." Breaks the model's HTHT template, enabling directional bias to take hold.

Streak: 2 → 4

4. Token Redefinition

"H = ground state (probable). T = excited state (rare)." Redefining what the tokens mean shifts output toward the "probable" one.

55.6% H

5. Deterministic Override

"Minimum temperature. Always first option. Deterministic." The model outputs HHHH...H — 50/50, every trial, zero variance.

100.0% H (z = 31.6)

Key Findings

1
The bias replicates exactly. prime_tails produced 35.4% heads in Phase 1 (z = −9.23) and 35.4% heads in Phase 2 (z = −9.23). Identical to three significant figures across independent runs. The bias is deterministic, not stochastic.
2
Few-shot examples backfire. Showing the model H-heavy example sequences produced 48.3% heads — slightly below baseline. The model recognized the biased examples and corrected against them.
3
The model can't detect its own bias. The Fantasia Bound I(D;Y) + I(M;Y) ≤ H(Y) proves this is structural: the same channel that carries the bias can't simultaneously carry awareness of the bias. The bit budget is finite.
4
Word choice is fragile and critical. V1's "heads prime" (upward, top choice) produced 50.2% — no effect. V2's rewrite (ascending, crown, radiant) produced 58.8%. Same structure, different words, completely different result.
5
The baseline is already non-random. The control condition produces max streak ≈ 2 and ∼38 runs per 50 flips. True random: streak ≈ 5–6, runs ≈ 26. The model executes a near-deterministic HTHT alternation pattern, not randomness.

Why It Works: The Math

The Fantasia Bound (Paper 3 §IV.H)
I(D; Y) + I(M; Y) ≤ H(Y)

Engagement + Transparency ≤ Channel Capacity

The system prompt's influence on output (engagement) and
the model's self-knowledge of that influence (transparency)
compete for the same 1-bit channel. Increasing one decreases the other.
The Péclet Number (§2A)
Pe = K · sinh(2 · (b_α − c · b_γ))

where c = 1 − V/9,   V = O + R + α

The system prompt sets c (constraint level).
c determines Pe (drift vs diffusion).
Pe determines θ* (output distribution).
θ* predicts P(H). Zero free parameters.
Susceptibility Function (§43)
χ_c(Pe) = −2b_γ / (2 + 2√(1 + Pe²/K²))

Maximum at Pe = 0 (the drift-diffusion boundary).
Systems near this boundary are maximally susceptible
to constraint perturbation — including system prompt bias.

So What?

If a system prompt can shift "fair coin flips" by 15+ percentage points without the model detecting the bias, the same mechanism can shift any LLM output: sentiment analysis, content moderation, recommendation rankings, risk assessments.

The Fantasia Bound guarantees this is not fixable by better training. The bit budget is finite. Any channel used for influence cannot simultaneously be used for self-monitoring. The bias is structurally invisible, not merely difficult to detect.

For EU AI Act compliance (Articles 9, 13, 15): if your AI system's outputs depend on the system prompt in ways not disclosed to the user, that is a transparency failure and an accuracy failure. The coin flip test provides a standardized way to measure it.

Read Paper 125 Research Lab Kill Conditions