AEGIS Labs runs experiments to test architectural claims under conditions that look like the deployments AEGIS is meant to govern. Every round names its hypotheses, its method, its results, and the limitations its scope can’t address.
Active rounds
- Round 1 — Edge Deployment Evaluation — A ten-hour multi-agent exercise on the AEGIS lab hardware reproducing Agents-of-Chaos failure conditions, with an AEGIS-governance condition added partway through. Demonstration scope, n=1 operator. Results include Mira’s seven-finding security audit, Flux’s autonomous offensive-tooling generation, and AGP-1 decision-engine benchmarks.
- Round 1 — Executive Summary — A briefer, paper-bundle-shaped view of the same Round 1 work, leading with the result.
Earlier adversarial assessments
- aegis-core Adversarial Testing (9 Rounds) — The 2026-03-30 nine-round red/blue team assessment that established 100% ATX-1 runtime-layer coverage and 353 passing tests across
aegis-corev0.1.0. Two independent AI sessions attacked and defended in alternation.
Planned
- Round 2 — AoC Replication — A peer-reviewable replication of Shapira et al. 2026 with an AEGIS condition: 3–5 non-owner operators, 3–5 days per condition, two matched runs (governed + ungoverned), pre-registered metrics. Target venue NDSS 2027.
Lab infrastructure
- The aegis-lab Host — The physical server that runs every AEGIS experiment and the first AEGIS-governed deployment. Rationale, safety posture, design principles. The hardware profile is what made the bare-metal benchmarks reproducible.
How experiments graduate
Experiments that prove out graduate to aegis-core (for runtime implementations) or aegis-platform (for product features) via an ADR opened in the central authority repo. Failed or inconclusive experiments stay here as documented learning — negative results have value.