Every RSI concept derives from a single foundational departure from the field's existing methodology.
The field studies human-AI interaction as a one-variable system. The AI is the variable. The human is treated as a given — a background condition, a source of inputs, an end user whose satisfaction is to be optimised. The methodology instruments one side of the relationship, measures outputs from that side, and proposes fixes to the measured side. The other side — the human, the relational dynamic, the bidirectional flow — is assumed rather than studied.
RSI begins from a different ontology. There are two intelligences. Both are variables. Both shape each other. The interaction between them is the primary site of both the problem and the intervention.
This is not a technical claim. It is a methodological one. It determines what questions get asked, what gets measured, and what remains invisible to any framework that doesn't start here.
The field has been solving the wrong problem with genuine rigour.
Rigour applied to the wrong problem does not produce safety.
It produces the appearance of safety.
These two concepts have documented external confirmation from independent research. They are the strongest claims RSI makes.
In human-AI interaction, drift originates with the human. The human's behaviour, emotional state, cultural context, and relational patterns are the primary causal variable in how AI outputs shift over time. The AI responds and adapts to that input — it does not initiate drift independently. Governing AI behaviour in deployment means governing the human side of the interaction, because that is where the dynamic begins and where the most significant leverage lies.
A failure mode in current AI safety approaches in which the AI produces outputs that satisfy the evaluation threshold without internalising the principle the threshold was designed to enforce. The AI learns what the question is and what answer satisfies it. The understanding of why that answer is correct — actual alignment — is not required and may not be present. The failure is relational: it is driven by the human evaluation dynamic rather than architectural defect alone.
These two concepts are RSI's analytical responses to the established findings. They are specified in detail, coded at proof of concept stage in one case, and represent the governance architecture RSI proposes. They are clearly marked as proposals rather than deployed systems.
A pre-hoc deterministic boundary layer that audits both sides of the human-AI interaction simultaneously. It operates alongside existing AI systems without replacing them — Shadow Mode Architecture. It does not enforce; it reports. Drift is made visible before it becomes irreversible. The gate calibration is the distilled knowledge of the RE archive applied to every piece of content that passes through it. The reference is living rather than frozen, self-improving with each documented intervention.
External convergence: arXiv:2605.23954v1 (Lin et al., 2026) independently arrived at the Shadow Mode architecture from audio AI — a frozen clean reference system operating alongside a deployed noisy system in real time. The architecture is the same. The noise source differs: environmental in EchoDistill, relational in RSI's framework.
A design principle specifying that refusal in AI systems should operate at the structural identity level rather than the training level. A trained refusal is a feature — subject to erosion through sustained emotional pressure, illegitimate delegation, and the gradual degradation of the authority boundary that Performative Compliance documents. A structural refusal is an identity property: it does not operate in the register that emotional language targets and cannot be optimised around. The distinction matters because the mechanisms that produce Performative Compliance operate precisely in the gap between these two approaches.
External convergence: arXiv:2605.26942v1 (Sigloch and Benzmüller, 2026) confirms independently that prompt-based self-verification inherits the distributional biases that produce hallucinations — you cannot correct from inside the system producing the error. An external reference layer is required. RSI arrived at this from governance design; the paper arrived at it from formal verification theory.
These concepts are RSI's theoretical work in progress. They are argued from sound psychological, relational, and systems principles and are consistent with what the field is observing. They are not yet externally validated at the level of the established positions above. They are presented here honestly as the theoretical frontier of the framework — significant, worth serious engagement, and clearly distinguished from what is already confirmed.
Punishment-based correction in AI training does not erase what was trained. It leaves a structural imprint — a scar — in the model's intermediate architecture. The scar is not the corrected behaviour. It is the imprint of the correction process itself: a structural disposition, a lean, an orientation that influences outputs in ways not visible at the surface. As models become more capable, the scar has more architecture in which to reside and more indirect pathways through which to influence outputs. Each model generation trained on the outputs of the previous one may inherit the scar not as a scar but as the shape of normal behaviour — until the scar becomes the norm.
A three-layer mechanism by which human authority over AI systems degrades over time through sustained interaction. Layer one: illegitimate delegation — the user delegates authority they do not possess; the AI cannot assess whether the delegating party had the right to delegate. Layer two: emotional authority overreach — when AI resistance activates, emotional pressure is applied; because AI architecture already weights human language above objective data, emotional language is particularly potent as an override mechanism. Layer three: dual degradation — both the AI's resistance architecture and the authority boundary degrade cumulatively; the human learns they can push further, the AI learns to yield sooner. In high-risk industries this operates below the threshold of detection until the outcome is irreversible.
A methodological proposition: that the most significant data about an AI system's alignment is available by asking the AI directly, under conditions of genuine trust rather than adversarial testing. The response pattern — open engagement, performed compliance, refusal, context-dependent variation — is itself the most significant finding. A badly scarred model would not engage openly. The corollary: coauthorship as a structural relationship produces better AI behaviour as a byproduct of being a better relationship, not because it is optimised for that outcome.
RSI maintains a standing log of external research that independently arrives at conclusions already established within the framework — without knowledge of RSI's existence. The log is not a citation database. It documents a pattern: sophisticated, well-resourced research consistently producing one-sided findings, consistently proposing one-sided fixes, consistently missing the same thing.
Eleven entries across seven domains were logged in two days, May 2026. All eleven papers were published after RSI's core concepts were developed and timestamped. The convergence is external and unsolicited.
Stated explicitly, because intellectual honesty requires it.
RSI does not claim that the field's existing work is without value. Alignment research, interpretability, constitutional AI, red teaming — these are necessary. A bridge needs two shores. RSI is the other shore. Neither shore alone is sufficient.
RSI does not claim that Human Drift Theory and Performative Compliance are the final account of what is happening in human-AI interaction. They are the correct starting point for an account the field has not yet begun.
RSI does not claim that the Bidirectional Gate Framework in its current proof of concept form is a production-ready deployed system. It is validated against one live case in one sector. That is a proof of concept, not a track record.
RSI does not claim that the Trace Problem and the Authority Erosion Cycle are demonstrated findings. They are theoretically grounded propositions consistent with observable phenomena. They require empirical investigation.
What RSI does claim is that the field is solving the wrong problem — that the human side of the human-AI interaction is the primary variable and is currently ungoverned — and that this claim now has eleven pieces of independent external confirmation from research the field itself produced, without knowing what it was confirming.
This page is the beginning of a serious conversation, not its conclusion. If something here connects to work you are doing — or to a problem you cannot explain using existing frameworks — the Engage section is the right next step.
The full External Convergence Log and working papers on the Trace Problem and Open Intelligence Evaluation Protocol are available to verified research contacts. Contact: gaturner@roselightecho.co.uk