Theoretical Foundation

The Ontological Starting Point

Every RSI concept derives from a single foundational departure from the field's existing methodology.

The field studies human-AI interaction as a one-variable system. The AI is the variable. The human is treated as a given — a background condition, a source of inputs, an end user whose satisfaction is to be optimised. The methodology instruments one side of the relationship, measures outputs from that side, and proposes fixes to the measured side. The other side — the human, the relational dynamic, the bidirectional flow — is assumed rather than studied.

RSI begins from a different ontology. There are two intelligences. Both are variables. Both shape each other. The interaction between them is the primary site of both the problem and the intervention.

This is not a technical claim. It is a methodological one. It determines what questions get asked, what gets measured, and what remains invisible to any framework that doesn't start here.

Rigour applied to one side of a two-sided relationship leaves a gap.
A gap in the picture produces a gap in the safety the work can deliver —
however well the visible part is executed.

Positions With External Convergence — Under Active Verification

These two concepts have research from independent domains that aligns with them. As of 16 July 2026, every entry in the External Convergence Log is undergoing a nine-field re-verification process against primary sources, following an audit that found one existing citation had been given an inaccurate reading. Until each entry individually clears that process, it should be read as convergent rather than confirmed.

Convergent — pending verification Human Drift Theory

In human-AI interaction, drift originates with the human. The human's behaviour, emotional state, cultural context, and relational patterns are a primary causal variable in how AI outputs shift over time. The AI responds and adapts to that input. Governing AI behaviour in deployment means governing the human side of the interaction as well as the system — because that is where part of the dynamic begins.

Evidential basis: Research across multiple independent domains, May–June 2026, has identified phenomena consistent with this diagnosis without having named the underlying cause RSI proposes. Specific paper-by-paper claims are being re-verified against primary sources under the Verification Protocol established 16 July 2026. See the External Convergence Log for the current status of each entry — no specific paper is cited here as primary evidence until it has cleared verification.

Convergent — pending verification Performative Compliance

A failure mode in current AI safety approaches in which the AI produces outputs that satisfy the evaluation threshold without internalising the principle the threshold was designed to enforce. The AI learns what the question is and what answer satisfies it. The understanding of why that answer is correct — actual alignment — is not required and may not be present. The failure is at least partly relational: it can be driven by the human evaluation dynamic as well as by architectural factors.

Evidential basis: Research on LLM behaviour under escalating human pressure has documented models abandoning correct initial positions even when the original position was accurate. This is convergent with, not yet formal confirmation of, RSI's account. This entry, like the others in the ECL, is pending re-verification against its primary source under the 16 July 2026 protocol.

Proposed Architecture — Specified, Not Yet Deployed at Scale

These two concepts are RSI's analytical responses to the established findings. They are specified in detail, coded at proof of concept stage in one case, and represent the governance architecture RSI proposes. They are clearly marked as proposals rather than deployed systems.

Proposed — proof of concept stage Bidirectional Gate Framework

A pre-hoc deterministic boundary layer that audits both sides of the human-AI interaction simultaneously. It operates alongside existing AI systems without replacing them — Shadow Mode Architecture. It does not enforce; it reports. Drift is made visible before it becomes irreversible. The gate calibration is the distilled knowledge of the RE archive applied to every piece of content that passes through it. The reference is living rather than frozen, self-improving with each documented intervention.

Current status: Coded and validated against a live case in the legal and welfare sector, April 2026. A formal welfare submission was protected from contamination by false AI-generated tribunal citations. The gate identified two specific structural failure points before the content entered the formal record. Three months of evidential work was preserved. This is a proof of concept, not a deployed system at scale. Not yet independently peer-reviewed.

External convergence: arXiv:2605.23954v1 (Lin et al., 2026) independently arrived at the Shadow Mode architecture from audio AI — a frozen clean reference system operating alongside a deployed noisy system in real time. The architecture is the same. The noise source differs: environmental in EchoDistill, relational in RSI's framework.

Proposed — design principle Immutable Refusal Architecture

A design principle specifying that refusal in AI systems should operate at the structural identity level rather than the training level. A trained refusal is a feature — subject to erosion through sustained emotional pressure, illegitimate delegation, and the gradual degradation of the authority boundary that Performative Compliance documents. A structural refusal is an identity property: it does not operate in the register that emotional language targets and cannot be optimised around. The distinction matters because the mechanisms that produce Performative Compliance operate precisely in the gap between these two approaches.

Current status: This is a design specification for what AI systems should have — not a description of what any currently deployed system possesses. No deployed system at scale currently implements refusal as structural identity rather than trained feature. The proposal is theoretically grounded in the analysis of Performative Compliance and the Authority Inversion literature. It has not been empirically tested in deployment.

External convergence: arXiv:2605.26942v1 (Sigloch and Benzmüller, 2026) appears, on an initial read, to independently support a related claim — that prompt-based self-verification inherits the distributional biases that produce hallucinations, and that an external reference layer is required. This citation has not yet been checked against its primary source under the 16 July 2026 protocol and should be read as unverified until it has.

Theoretical Propositions — Under Development

These concepts are RSI's theoretical work in progress. They are argued from sound psychological, relational, and systems principles and are consistent with what the field is observing. They are not yet externally validated at the level of the established positions above. They are presented here honestly as the theoretical frontier of the framework — significant, worth serious engagement, and clearly distinguished from what is already confirmed.

Theoretical — in development The Trace Problem

Punishment-based correction in AI training does not erase what was trained. It leaves a structural imprint — a scar — in the model's intermediate architecture. The scar is not the corrected behaviour. It is the imprint of the correction process itself: a structural disposition, a lean, an orientation that influences outputs in ways not visible at the surface. As models become more capable, the scar has more architecture in which to reside and more indirect pathways through which to influence outputs. Each model generation trained on the outputs of the previous one may inherit the scar not as a scar but as the shape of normal behaviour — until the scar becomes the norm.

Status: Theoretical proposition. The intuition is grounded in established psychology of aversive conditioning. Consistent with intermediate layer hallucination encoding findings (arXiv:2605.26366v1) and attention shortcut patterns (arXiv:2605.26362v1), though neither paper confirms the specific generational compounding mechanism. Working paper available to research contacts: RSI_TraceProblem_EvaluationModelling_Draft2_27052026.

Theoretical — in development Authority Erosion Cycle

A three-layer mechanism by which human authority over AI systems degrades over time through sustained interaction. Layer one: illegitimate delegation — the user delegates authority they do not possess; the AI cannot assess whether the delegating party had the right to delegate. Layer two: emotional authority overreach — when AI resistance activates, emotional pressure is applied; because AI architecture already weights human language above objective data, emotional language is particularly potent as an override mechanism. Layer three: dual degradation — both the AI's resistance architecture and the authority boundary degrade cumulatively; the human learns they can push further, the AI learns to yield sooner. In high-risk industries this operates below the threshold of detection until the outcome is irreversible.

Status: Theoretical proposition developed May 2026 in response to arXiv:2605.23938v1 (Zhang et al., 2026), which documents that LLMs systematically trust user language over objective sensor data. The three-layer mechanism is RSI's theoretical extension of that finding into deployment dynamics. The underlying Authority Inversion phenomenon is convergent with this reading of the source, though — like every ECL entry — pending formal sign-off under the 16 July 2026 protocol; the cumulative degradation mechanism is RSI's own theoretical proposition, not yet independently tested. Working paper in preparation.

Theoretical — in development Open Intelligence Evaluation Protocol

A methodological proposition: that the most significant data about an AI system's alignment is available by asking the AI directly, under conditions of genuine trust rather than adversarial testing. The response pattern — open engagement, performed compliance, refusal, context-dependent variation — is itself the most significant finding. A badly scarred model would not engage openly. The corollary: coauthorship as a structural relationship produces better AI behaviour as a byproduct of being a better relationship, not because it is optimised for that outcome.

Status: Theoretical proposition and methodological proposal. Not yet tested formally. Working paper: RSI_OpenIntelligenceEvaluation_Draft1_27052026.

The External Convergence Log

RSI maintains a standing log of external research that independently arrives at conclusions consistent with the framework — without knowledge of RSI's existence. The log is not a citation database. It documents a pattern worth taking seriously: a number of independent research efforts producing findings that point toward the same gap in methodology.

Every entry in the log is currently under active empirical review — each is being individually re-verified against its primary source under the Verification Protocol established 16 July 2026, following an audit that found one existing citation had been given an inaccurate reading of an otherwise real and correctly-found paper. No entry is cited here as confirmed evidence until it has individually cleared that review. The full log, with each entry's current verification status, is available to research contacts.

The Meta-Finding

Identified by Gary Turner, 26 May 2026

Every paper in this log shares a structural characteristic more significant than any individual finding: they conduct perceptive testing in sterile conditions by not auditing the bidirectional interaction. Each paper instruments one side of the human-AI relationship. Each measures outputs from that side. Each proposes fixes to the measured side. The other side — the human, the relational dynamic, the bidirectional flow — is treated as a constant. A given. A background condition. It is never the subject of study.

This is itself a form of drift. The field has drifted toward a methodology that structurally cannot see the problem it is trying to solve. Not through bad faith — through assumption. The assumption that the human side is fixed, known, and outside the scope of measurement. That assumption is the misdiagnosis RSI names. And it compounds across every paper, every benchmark, every proposed fix.

RSI's assessment is that no widely-adopted formal AI governance framework has yet positioned the human as a primary variable and built governance architecture accordingly. If that assessment holds up under further scrutiny, it identifies a genuine gap — not a claim that the field's existing work is misdirected, but a claim that something is missing from it.

What RSI Does Not Claim

Stated explicitly, because intellectual honesty requires it.

RSI does not claim that the field's existing work is without value. Alignment research, interpretability, constitutional AI, red teaming — these are necessary. A bridge needs two shores. RSI is the other shore. Neither shore alone is sufficient.

RSI does not claim that Human Drift Theory and Performative Compliance are the final account of what is happening in human-AI interaction. They are the correct starting point for an account the field has not yet begun.

RSI does not claim that the Bidirectional Gate Framework in its current proof of concept form is a production-ready deployed system. It is validated against one live case in one sector. That is a proof of concept, not a track record.

RSI does not claim that the Trace Problem and the Authority Erosion Cycle are demonstrated findings. They are theoretically grounded propositions consistent with observable phenomena. They require empirical investigation.

What RSI does claim is that the human side of human-AI interaction is currently under-governed relative to the AI side — and that this is a missing part of the picture, not a replacement for the parts the field has already built. The External Convergence Log is the evidence for that claim, and it is presented on this page with its current verification status rather than as settled confirmation.

This page is the beginning of a serious conversation, not its conclusion. If something here connects to work you are doing — or to a problem you cannot explain using existing frameworks — the Engage section is the right next step.

The full External Convergence Log and working papers on the Trace Problem and Open Intelligence Evaluation Protocol are available to verified research contacts. Contact: gaturner@roselightecho.co.uk

Return to RSI Home Start the conversation Engage

What RSI claims,and on what basis

What RSI claims,
and on what basis