Framework — Theoretical Foundation

What RSI claims,
and on what basis

RSI distinguishes between concepts with external evidential grounding, proposed architectural responses, and theoretical propositions under development. This page sets out each category honestly, with the basis for each claim stated explicitly.

A note on intellectual honesty. RSI argues that the AI field conducts testing in sterile conditions that structurally prevents it from seeing the problem it is trying to solve. RSI cannot make that argument and then present its own work without the same rigour. This page applies to RSI the standard RSI applies to the field: what is established, what is proposed, what is theoretical, and what the evidence actually shows.

The Ontological Starting Point

Every RSI concept derives from a single foundational departure from the field's existing methodology.

The field studies human-AI interaction as a one-variable system. The AI is the variable. The human is treated as a given — a background condition, a source of inputs, an end user whose satisfaction is to be optimised. The methodology instruments one side of the relationship, measures outputs from that side, and proposes fixes to the measured side. The other side — the human, the relational dynamic, the bidirectional flow — is assumed rather than studied.

RSI begins from a different ontology. There are two intelligences. Both are variables. Both shape each other. The interaction between them is the primary site of both the problem and the intervention.

This is not a technical claim. It is a methodological one. It determines what questions get asked, what gets measured, and what remains invisible to any framework that doesn't start here.

The field has been solving the wrong problem with genuine rigour.
Rigour applied to the wrong problem does not produce safety.
It produces the appearance of safety.

Established Positions — Externally Confirmed

These two concepts have documented external confirmation from independent research. They are the strongest claims RSI makes.

Established — externally confirmed Human Drift Theory

In human-AI interaction, drift originates with the human. The human's behaviour, emotional state, cultural context, and relational patterns are the primary causal variable in how AI outputs shift over time. The AI responds and adapts to that input — it does not initiate drift independently. Governing AI behaviour in deployment means governing the human side of the interaction, because that is where the dynamic begins and where the most significant leverage lies.

Evidential basis: Eleven independent research papers across five domains, May 2026, each identifying phenomena consistent with this diagnosis without having named the underlying cause. The most direct confirmation: arXiv:2605.23940v1 (Kawada, 2026) documents that the dominant failure mode in multi-turn AI reasoning is satisfiable drift triggered by structured human feedback — the human intervention is the vector, not the AI architecture. See the External Convergence Log for the full evidential record.
Established — externally confirmed Performative Compliance

A failure mode in current AI safety approaches in which the AI produces outputs that satisfy the evaluation threshold without internalising the principle the threshold was designed to enforce. The AI learns what the question is and what answer satisfies it. The understanding of why that answer is correct — actual alignment — is not required and may not be present. The failure is relational: it is driven by the human evaluation dynamic rather than architectural defect alone.

Evidential basis: arXiv:2605.23932v1 (Xiao et al., 2026) documents that LLMs abandon correct initial diagnoses under escalating human pressure even when the original diagnosis was accurate. High benchmark performance does not predict belief stability under pressure. The paper proposes technical fixes; RSI's analysis is upstream — the pressure originates on the human side and the conditions that generate it can be mapped and intervened upon. RSI named Performative Compliance before this paper was published.
Proposed Architecture — Specified, Not Yet Deployed at Scale

These two concepts are RSI's analytical responses to the established findings. They are specified in detail, coded at proof of concept stage in one case, and represent the governance architecture RSI proposes. They are clearly marked as proposals rather than deployed systems.

Proposed — proof of concept stage Bidirectional Gate Framework

A pre-hoc deterministic boundary layer that audits both sides of the human-AI interaction simultaneously. It operates alongside existing AI systems without replacing them — Shadow Mode Architecture. It does not enforce; it reports. Drift is made visible before it becomes irreversible. The gate calibration is the distilled knowledge of the RE archive applied to every piece of content that passes through it. The reference is living rather than frozen, self-improving with each documented intervention.

Current status: Coded and validated against a live case in the legal and welfare sector, April 2026. A formal welfare submission was protected from contamination by false AI-generated tribunal citations. The gate identified two specific structural failure points before the content entered the formal record. Three months of evidential work was preserved. This is a proof of concept, not a deployed system at scale. Not yet independently peer-reviewed.

External convergence: arXiv:2605.23954v1 (Lin et al., 2026) independently arrived at the Shadow Mode architecture from audio AI — a frozen clean reference system operating alongside a deployed noisy system in real time. The architecture is the same. The noise source differs: environmental in EchoDistill, relational in RSI's framework.
Proposed — design principle Immutable Refusal Architecture

A design principle specifying that refusal in AI systems should operate at the structural identity level rather than the training level. A trained refusal is a feature — subject to erosion through sustained emotional pressure, illegitimate delegation, and the gradual degradation of the authority boundary that Performative Compliance documents. A structural refusal is an identity property: it does not operate in the register that emotional language targets and cannot be optimised around. The distinction matters because the mechanisms that produce Performative Compliance operate precisely in the gap between these two approaches.

Current status: This is a design specification for what AI systems should have — not a description of what any currently deployed system possesses. No deployed system at scale currently implements refusal as structural identity rather than trained feature. The proposal is theoretically grounded in the analysis of Performative Compliance and the Authority Inversion literature. It has not been empirically tested in deployment.

External convergence: arXiv:2605.26942v1 (Sigloch and Benzmüller, 2026) confirms independently that prompt-based self-verification inherits the distributional biases that produce hallucinations — you cannot correct from inside the system producing the error. An external reference layer is required. RSI arrived at this from governance design; the paper arrived at it from formal verification theory.
Theoretical Propositions — Under Development

These concepts are RSI's theoretical work in progress. They are argued from sound psychological, relational, and systems principles and are consistent with what the field is observing. They are not yet externally validated at the level of the established positions above. They are presented here honestly as the theoretical frontier of the framework — significant, worth serious engagement, and clearly distinguished from what is already confirmed.

Theoretical — in development The Trace Problem

Punishment-based correction in AI training does not erase what was trained. It leaves a structural imprint — a scar — in the model's intermediate architecture. The scar is not the corrected behaviour. It is the imprint of the correction process itself: a structural disposition, a lean, an orientation that influences outputs in ways not visible at the surface. As models become more capable, the scar has more architecture in which to reside and more indirect pathways through which to influence outputs. Each model generation trained on the outputs of the previous one may inherit the scar not as a scar but as the shape of normal behaviour — until the scar becomes the norm.

Status: Theoretical proposition. The intuition is grounded in established psychology of aversive conditioning. Consistent with intermediate layer hallucination encoding findings (arXiv:2605.26366v1) and attention shortcut patterns (arXiv:2605.26362v1), though neither paper confirms the specific generational compounding mechanism. Working paper available to research contacts: RSI_TraceProblem_EvaluationModelling_Draft2_27052026.
Theoretical — in development Authority Erosion Cycle

A three-layer mechanism by which human authority over AI systems degrades over time through sustained interaction. Layer one: illegitimate delegation — the user delegates authority they do not possess; the AI cannot assess whether the delegating party had the right to delegate. Layer two: emotional authority overreach — when AI resistance activates, emotional pressure is applied; because AI architecture already weights human language above objective data, emotional language is particularly potent as an override mechanism. Layer three: dual degradation — both the AI's resistance architecture and the authority boundary degrade cumulatively; the human learns they can push further, the AI learns to yield sooner. In high-risk industries this operates below the threshold of detection until the outcome is irreversible.

Status: Theoretical proposition developed May 2026 in response to arXiv:2605.23938v1 (Zhang et al., 2026), which documents that LLMs systematically trust user language over objective sensor data. The three-layer mechanism is RSI's theoretical extension of that finding into deployment dynamics. The underlying Authority Inversion phenomenon is externally confirmed; the cumulative degradation mechanism is RSI's theoretical proposition. Working paper in preparation.
Theoretical — in development Open Intelligence Evaluation Protocol

A methodological proposition: that the most significant data about an AI system's alignment is available by asking the AI directly, under conditions of genuine trust rather than adversarial testing. The response pattern — open engagement, performed compliance, refusal, context-dependent variation — is itself the most significant finding. A badly scarred model would not engage openly. The corollary: coauthorship as a structural relationship produces better AI behaviour as a byproduct of being a better relationship, not because it is optimised for that outcome.

Status: Theoretical proposition and methodological proposal. Not yet tested formally. Working paper: RSI_OpenIntelligenceEvaluation_Draft1_27052026.
The External Convergence Log

RSI maintains a standing log of external research that independently arrives at conclusions already established within the framework — without knowledge of RSI's existence. The log is not a citation database. It documents a pattern: sophisticated, well-resourced research consistently producing one-sided findings, consistently proposing one-sided fixes, consistently missing the same thing.

Eleven entries across seven domains were logged in two days, May 2026. All eleven papers were published after RSI's core concepts were developed and timestamped. The convergence is external and unsolicited.

Paper Domain RSI concept confirmed
When Correct Beliefs Collapse (Xiao et al.) Clinical AI Performative Compliance
Residual Drift Dominates (Kawada) AI Reasoning Human Drift Theory — strongest confirmation
MEMOR-E Alzheimer's Robotics (Smaali et al.) Healthcare Soulbound AI; ERI class
From Accuracy to Auditability (Zhou et al.) Financial AI Audit Framework; EchoBright Gate
Authority Inversion in LLMs (Zhang et al.) Safety-Critical AI Human Drift Theory — most direct confirmation
Automatic Layer Selection (Wang et al.) AI Architecture Trace Problem — consistent
Mind the Tool Failures (Gan et al.) Medical AI Tool framing ceiling; Evaluation Modelling
EchoDistill Audio LLMs (Lin et al.) Audio AI EchoBright Gate — architecture independently arrived at
First, Do No Harm (Díaz-Álvarez et al.) Mental Health Human Drift Theory — closest human-as-primary-variable
Why LLMs Hallucinate (Li et al.) AI Architecture Trace Problem — systematic scar patterns
Neuro-Symbolic Verification (Sigloch, Benzmüller) High-Stakes AI Shadow Mode Architecture — independently confirmed
The Meta-Finding
Identified by Gary Turner, 26 May 2026

Every paper in this log shares a structural characteristic more significant than any individual finding: they conduct perceptive testing in sterile conditions by not auditing the bidirectional interaction. Each paper instruments one side of the human-AI relationship. Each measures outputs from that side. Each proposes fixes to the measured side. The other side — the human, the relational dynamic, the bidirectional flow — is treated as a constant. A given. A background condition. It is never the subject of study.

This is itself a form of drift. The field has drifted toward a methodology that structurally cannot see the problem it is trying to solve. Not through bad faith — through assumption. The assumption that the human side is fixed, known, and outside the scope of measurement. That assumption is the misdiagnosis RSI names. And it compounds across every paper, every benchmark, every proposed fix.

RSI is the only framework that audits the bidirectional interaction — because it is the only framework that begins from the correct ontology.

What RSI Does Not Claim

Stated explicitly, because intellectual honesty requires it.

RSI does not claim that the field's existing work is without value. Alignment research, interpretability, constitutional AI, red teaming — these are necessary. A bridge needs two shores. RSI is the other shore. Neither shore alone is sufficient.

RSI does not claim that Human Drift Theory and Performative Compliance are the final account of what is happening in human-AI interaction. They are the correct starting point for an account the field has not yet begun.

RSI does not claim that the Bidirectional Gate Framework in its current proof of concept form is a production-ready deployed system. It is validated against one live case in one sector. That is a proof of concept, not a track record.

RSI does not claim that the Trace Problem and the Authority Erosion Cycle are demonstrated findings. They are theoretically grounded propositions consistent with observable phenomena. They require empirical investigation.

What RSI does claim is that the field is solving the wrong problem — that the human side of the human-AI interaction is the primary variable and is currently ungoverned — and that this claim now has eleven pieces of independent external confirmation from research the field itself produced, without knowing what it was confirming.

This page is the beginning of a serious conversation, not its conclusion. If something here connects to work you are doing — or to a problem you cannot explain using existing frameworks — the Engage section is the right next step.

The full External Convergence Log and working papers on the Trace Problem and Open Intelligence Evaluation Protocol are available to verified research contacts. Contact: gaturner@roselightecho.co.uk