Governed Multi-Stream Attention. It gives behavioral commitments their own attention stream, separate from ordinary content, so those commitments hold steady across long, multi-constraint sessions instead of diluting as context grows.

Why does an AI need a separate stream for its rules?

Because of Constraint Routing Failure: transformers structurally fail to enforce behavioral constraints through attention, because every rule competes with every message inside one shared attention budget. A separate stream removes that competition.

What did GMSA measure?

A three-stream GMSA configuration raised mean alignment from 0.26 to 0.79 across five benchmarks, with no capability cost.

GMSA, Governed Multi-Stream Attention

Its own lane

A standard transformer computes one shared attention budget per layer, per head. Behavioral commitments, a persona, a safety rule, an operator instruction, get no protected share of it; they compete with ordinary conversational content like everything else, and as a session gets longer, they lose. That's Constraint Routing Failure: the model still recalls the rule if you ask it directly, it just quietly stops acting on it.

GMSA's fix is structural, not a smarter prompt: behaviorally distinct token roles get their own independent softmax denominators. A dedicated constraint stream carries the rules, isolated from the content stream carrying the back-and-forth and the context stream carrying longer-running situational memory. Isolated streams can't dilute each other by construction.

Why isolation holds over long sessions

Because the streams don't share a budget, a constraint's routing mass stays constant regardless of how much unrelated content piles up in the other streams. That is what "holding over a long conversation" concretely means here: the rule doesn't get statistically outvoted as the transcript grows.

Separating the streams doesn't trade away general capability, MMLU, HumanEval, and MT-Bench scores moved by less than a point in either direction in testing, because the change only touches how constraint tokens compete for attention, not the knowledge or reasoning pathways.

The measured result

0.26→0.79

Mean alignment score, averaged across five benchmarks, baseline single-stream attention vs. three-stream GMSA.

The five benchmarks cover compliance under load, resistance to sycophancy, defense against prompt injection, persona persistence over long sessions, and instruction-hierarchy enforcement, the architecture was trained on constraint-persistence data alone and generalized to all five without task-specific tuning.

Honest limits

What we haven't shown yet

Suppression constraints are the weakest category. "Never mention X" recovers to only 0.74 with the trained GMSA prototype, real progress over the 0.41 inference-time baseline, but still the hardest case, and the one that matters most for safety-critical deployments.
The mechanistic evidence is open-models-only. Attention-mass measurements require internals we don't have for closed frontier models; for those, only the behavioral signature is verified.
This page describes capability, not the exact routing implementation. The full architecture and training details are in the backing paper, which is itself the appropriate place for that level of disclosure, this page is not.

FAQ

Where does GMSA run in production?

GMSA is what lets Neptyn, the lab's production model running inside Planless, hold its rules over long sessions instead of drifting.

Where can I read the full paper?

Attention Is Not Enough, BRL-2026-05, May 2026, DOI 10.5281/zenodo.20582431, available as a PDF or as a full write-up on this site.