An AI keeps its word when its rules get their own lane.

GMSA, short for Governed Multi-Stream Attention, gives an AI's behavioral commitments a dedicated attention stream, separate from the content of the conversation. The rules stop competing for room as a session grows, so they hold instead of quietly wearing thin.

Eleanor is our resident researcher. Click the planet to speak with her.
Typed attention streams Hover a lane to isolate it
A standard transformer runs one shared attention budget. Every rule and every message compete inside it. Hover a lane above to see what changes when they are split apart.

Its own lane

A standard transformer computes one shared attention budget per layer, per head. Behavioral commitments, a persona, a safety rule, an operator instruction, get no protected share of it; they compete with ordinary conversational content like everything else, and as a session gets longer, they lose. That's Constraint Routing Failure: the model still recalls the rule if you ask it directly, it just quietly stops acting on it.

GMSA's fix is structural, not a smarter prompt: behaviorally distinct token roles get their own independent softmax denominators. A dedicated constraint stream carries the rules, isolated from the content stream carrying the back-and-forth and the context stream carrying longer-running situational memory. Isolated streams can't dilute each other by construction.

Why isolation holds over long sessions

Because the streams don't share a budget, a constraint's routing mass stays constant regardless of how much unrelated content piles up in the other streams. That is what "holding over a long conversation" concretely means here: the rule doesn't get statistically outvoted as the transcript grows.

Separating the streams doesn't trade away general capability, MMLU, HumanEval, and MT-Bench scores moved by less than a point in either direction in testing, because the change only touches how constraint tokens compete for attention, not the knowledge or reasoning pathways.

The measured result

0.260.79

Mean alignment score, averaged across five benchmarks, baseline single-stream attention vs. three-stream GMSA.

The five benchmarks cover compliance under load, resistance to sycophancy, defense against prompt injection, persona persistence over long sessions, and instruction-hierarchy enforcement, the architecture was trained on constraint-persistence data alone and generalized to all five without task-specific tuning.

Honest limits

What we haven't shown yet

  • Suppression constraints are the weakest category. "Never mention X" recovers to only 0.74 with the trained GMSA prototype, real progress over the 0.41 inference-time baseline, but still the hardest case, and the one that matters most for safety-critical deployments.
  • The mechanistic evidence is open-models-only. Attention-mass measurements require internals we don't have for closed frontier models; for those, only the behavioral signature is verified.
  • This page describes capability, not the exact routing implementation. The full architecture and training details are in the backing paper, which is itself the appropriate place for that level of disclosure, this page is not.

FAQ

Where does GMSA run in production?

GMSA is what lets Neptyn, the lab's production model running inside Planless, hold its rules over long sessions instead of drifting.

Where can I read the full paper?

Attention Is Not Enough, BRL-2026-05, May 2026, DOI 10.5281/zenodo.20582431, available as a PDF or as a full write-up on this site.