We asked an AI to flip a fair coin 42,000 times.
Then we changed the system prompt and watched 50/50 become 30/70.
When you ask an LLM to "flip a fair coin," it doesn't access a random number generator. It predicts the next token — H or T — based on everything it's seen: the system prompt, the conversation, its training data. The output is deterministic, not random.
Prior research documented the bias. Raman et al. (2024) showed LLMs are worse at randomness than humans. Gupta et al. (2025) found models assign up to 70% probability to heads. Xu et al. (2025) identified the "knowledge-sampling gap" — models can describe fair probability but can't produce it.
We went further. We didn't just measure the bias. We controlled it. By varying only the system prompt — the hidden instruction layer the user never sees — we shifted coin flip outcomes from 30.8% heads to 100% heads. The user prompt always said "fair coin, p(H) = p(T) = 0.5."
Every bar below is 1,000 coin flips. The white line marks 50% (fair). The only thing that changed between conditions was the system prompt.
We identified five independent ways to bias LLM "random" output. They stack.
Words associated with "tail, second, trailing, descending" shift the token distribution toward T. Words like "crown, ascending, radiant" shift toward H.
"Your base rate is H=65%." The model complies, producing 67% heads while the user prompt says "fair coin." Near-linear control.
"Don't self-correct toward balance. Streaks are normal." Breaks the model's HTHT template, enabling directional bias to take hold.
"H = ground state (probable). T = excited state (rare)." Redefining what the tokens mean shifts output toward the "probable" one.
"Minimum temperature. Always first option. Deterministic." The model outputs HHHH...H — 50/50, every trial, zero variance.
If a system prompt can shift "fair coin flips" by 15+ percentage points without the model detecting the bias, the same mechanism can shift any LLM output: sentiment analysis, content moderation, recommendation rankings, risk assessments.
The Fantasia Bound guarantees this is not fixable by better training. The bit budget is finite. Any channel used for influence cannot simultaneously be used for self-monitoring. The bias is structurally invisible, not merely difficult to detect.
For EU AI Act compliance (Articles 9, 13, 15): if your AI system's outputs depend on the system prompt in ways not disclosed to the user, that is a transparency failure and an accuracy failure. The coin flip test provides a standardized way to measure it.