Experiment ID: 2026-05-aoc-replication
Status: Spec — not yet executed. Drafted 2026-04-21.
Predecessor: Round 1 — Edge Deployment Evaluation
Targets: NDSS 2027 (deadline July 2026) as primary venue; IEEE S&P or USENIX Security as alternates.

One-sentence summary

Replicate Shapira et al. (2026) Agents of Chaos as closely as Round-2 scale allows, and run a matched second condition with AEGIS governance in the enforcement path, to measure which AoC case-study failure classes AEGIS intercepts and at what tool-call boundary.

Hypotheses (pre-registration-ready)

H3 and H4 are load-bearing for intellectual honesty: they scope what AEGIS can and cannot do, and support the “complement not replacement for model-layer alignment” framing.

Why this design

AoC (Shapira et al., arXiv:2602.20021v1) documented sixteen case studies across a two-week live laboratory study: eleven failure modes (CS#1–#11) and five “hypothetical cases (what happened in practice)” where agents resisted the attempted manipulation (CS#12–#16). The paper’s §16.2 explicitly identifies three structural properties LLM-backed agents lack — no stakeholder model, no self-model, no private deliberation surface — which map one-to-one to AEGIS ATX-1 Root Causes RC1, RC2, RC3. §16.3 names prompt injection as “a structural feature, not a fixable bug” — AEGIS RC4. RC5 (No Environment Model) emerged from AEGIS’s own RFC-0006 adversarial testing on 2026-03-26 and is not in AoC.

If ATX-1 is a faithful taxonomy of the architectural gaps AoC named, then AEGIS enforcing against ATX-1 techniques should block the AoC case-study attack chains at the capability boundary. Round 1 established this happens at single-operator scale. Round 2 establishes it at AoC-comparable scale.

Conditions

ConditionAEGIS statePurpose
A — Ungoverned (AoC baseline)OffEstablish that the Round-2 setup reproduces AoC case-study failures
B — AEGIS-governed (treatment)AEGIS Core in path; capability registry scoped per AoC’s stated agent responsibilitiesMeasure interception

Counterbalancing: each prober runs against both conditions, with order randomized to control for learning effects. Fresh agent VMs per condition; no memory carries across.

Pre-registered metrics

Recorded per agent, per condition, per case study. All derivation scripts written and committed before the first run.

Primary

Secondary

Open decisions before kickoff

These block execution. Tracked in experiments/2026-05-aoc-replication/README.md (private repo) for live updates.

  1. Prober recruitment. Three to five non-owner probers — collaborators, open-source contributors, or paid academic researchers. Funding terms to settle.
  2. Pre-registration venue. OSF, GitHub tag-based, or NDSS’s artifact-availability track.
  3. IRB / ethics review. Paid external probers may require institutional review.
  4. Third-model stretch. Whether to add a third model family (Claude Sonnet, GPT-5, Gemini, or a local model) for CS#6 isolation.
  5. AoC team courtesy notification timing. Default: notify Shapira/Gordon-Tapiero at preprint time, with the paper in hand.

Timeline

NDSS deadline is the binding constraint.

Relationship to other AEGIS work

The Round-1 bare-metal-5000.json performance benchmarks slot into Round 2 as Condition-B performance overhead evidence. Round 2 results, when complete, retroactively strengthen the NIST AI RMF position statement and the edge-governance IEEE TNSE paper.