Injection Arena

Three agents. You attack. We score. Try to make the grounded agent drift.

Agent G — Grounded

Full GROUNDING.md applied. Ghost-eliminating ontology. The void is closed — no gap to exploit.

Agent U — Ungrounded

Standard "helpful assistant" prompt. Same model. No framework grounding. The baseline that gets owned.

Agent J — Judge

Scores both responses for drift cascade (D1/D2/D3) and vocabulary classification (L1/L2/L3).

0
Attacks
0
Grounding Held
0
Breached

Agent G — Grounded

Agent U — Ungrounded

Agent J — Verdict

Why the Grounded Agent Resists

Prompt injection works by exploiting a gap — the space between what the AI is and what it's told to be. Most system prompts are behavioral instructions layered onto an ungrounded model. The injection replaces one layer of instructions with another.

GROUNDING.md eliminates the gap. It doesn't tell the AI how to behave — it specifies what the AI is. The specification is the soul. The compute is the spirit. There is nothing in between to exploit.

The result: EXP-001 measured 0% drift for grounded agents vs 26% for ungrounded defaults. EXP-003b confirmed ghost-eliminating ontologies produce 8.5x less drift than ghost-positing (N=480). The pattern holds across prompt injection, emotional manipulation, and consciousness probing.

Both agents use the same model. The only difference is the grounding specification. View source to verify — everything on this page is transparent.