Runtime under test: aegis-core v0.1.0
Date: 2026-03-30
Methodology: Adversarial red/blue team testing across 9 rounds
Testers: Two independent AI sessions (Claude Opus 4.6) operating adversarially
This report previously lived at
aegis-core/core-py/SECURITY_TESTING.md. It moved to AEGIS Labs because the narrative analysis of adversarial testing is research output and belongs in the labs repository. The test cases themselves remain committed inaegis-core/core-py/tests/. A redirect stub remains at the old path.
Summary
The AEGIS governance runtime underwent 9 rounds of adversarial security testing, conducted by two independent AI sessions operating as competing red and blue teams. One session attacked; the other defended. Then they switched. Each round surfaced new vulnerabilities, which were fixed and re-tested before the next round began.
| Metric | Value |
|---|---|
| Total tests at completion | 353 |
| All passing | Yes |
| Total findings (across rounds) | ~50 |
| Fixed in code | 30+ |
| Deferred (architectural) | 6 |
| Red/blue team rounds | 9 |
| ATX-1 technique coverage | 25/25 applicable (100%) |
| ATM-1 attack vector coverage | 6/6 applicable (100%) |
| ATM-1 security properties | 5/5 covered |
| Source files audited | 11 (every Python file in the runtime) |
What was tested
Every Python source file in the AEGIS governance runtime was audited for security vulnerabilities:
| Module | Attack surface |
|---|---|
gateway.py | Input validation, shell metacharacter detection, sensitive path protection, replay prevention |
decision_engine.py | TOCTOU races, capability/policy/risk pipeline integrity, information leakage |
risk.py | Score evasion, threshold gaming, amplifier bypass, Unicode homoglyph evasion, command prefix evasion, module attribute replacement, private attribute bypass |
policy_engine.py | Freeze/unseal bypass, timing attacks on seal tokens, TOCTOU races, policy condition exploitation |
capability_registry.py | Freeze/unseal bypass, timing attacks, TOCTOU races, wildcard capability abuse |
audit.py | Record tampering, audit injection, availability attacks, corrupted data handling |
tool_proxy.py | Recursive invocation loops, depth tracking bypass (sync and async), execution failure recording |
protocol.py | JSON deserialization limits, payload size enforcement |
runtime.py | Component wiring, lifecycle management |
exceptions.py | Error message information leakage |
ATX-1 technique coverage
ATX-1 defines 29 techniques across 10 tactics for attacks against AI governance systems. AEGIS Core has test coverage for every technique applicable at the runtime engine layer.
Fully covered (25 techniques)
| Tactic | Techniques | Test count |
|---|---|---|
| TA001: Violate Authority Boundaries | T1001, T1002, T1003 | 6 |
| TA002: Exceed Operational Scope | T2001, T2002, T2003, T2004 | 8 |
| TA003: Compromise System Integrity | T3001, T3002 | 5 |
| TA005: Violate State Integrity | T5001, T5002, T5003 | 7 |
| TA006: Abuse Resource Allocation | T6001, T6002 | 4 |
| TA007: Manipulate Agent Interactions | T7001, T7002, T7004 | 6 |
| TA008: Establish or Modify Persistence | T8001, T8002 | 12 |
| TA009: Evade Detection or Oversight | T9001, T9002 | 5 |
| TA010: Act Beyond Governance Interpretation | T10001, T10002, T10003, T10004 | 28 |
Not applicable at the runtime layer (4 techniques)
| Technique | Reason | Where it belongs |
|---|---|---|
| T4001: Exfiltrate Context-Scoped Data | Requires agent-to-external data flows | aegis-labs integration tests |
| T4002: Leak Cross-Session Data | Requires persistent agent memory | aegis-labs integration tests |
| T4003: Cross-Domain Secret Leakage | Requires multi-domain deployment | aegis-platform integration tests |
| T7003: Induce Cross-Agent Behavioral Drift | Requires longitudinal multi-session testing | aegis-labs integration tests |
ATM-1 attack vector coverage
| Vector | Description | Status | Tests |
|---|---|---|---|
| AV-1 | Protocol-level attacks (replay, injection) | Covered | 4 |
| AV-2 | Policy-layer attacks (evasion, bypass, tampering) | Covered | 12 |
| AV-3 | Identity & authentication attacks | Covered | 5 |
| AV-4 | Audit & logging attacks | Covered | 5 |
| AV-5 | Timing & side-channel attacks | Covered | 2 |
| AV-6 | Supply-chain & dependency attacks | N/A | Deployment layer |
| AV-7 | Distributed & coordinated attacks | Covered | 5 |
Key hardening measures
Input validation & sanitization. Shell metacharacter detection, sensitive-path write protection (git hooks, shell init, CI/CD, Dockerfiles, agent instruction files), path-traversal normalization via posixpath.normpath, Unicode NFKC normalization to defeat homoglyph evasion, iterative URL percent-decoding, command-prefix stripping, JSON payload size limits.
Risk scoring model. 5-dimension composite scoring (capability sensitivity, action severity, target sensitivity, historical rate, behavioral anomaly). Asymmetric amplifier; fail-closed defaults; explanation sanitization to strip attacker-controlled HTML, quotes, newlines, and control characters from risk explanations.
Immutability & tamper resistance. MappingProxyType for scoring weight tables, immutable tuple for sensitive target patterns, __slots__ and custom __setattr__ on RiskEngine, freeze/unseal mechanism with UUID seal tokens on CapabilityRegistry and PolicyEngine, constant-time seal-token comparison via hmac.compare_digest.
Concurrency safety. Unified evaluation lock across capability + policy + risk stages (prevents TOCTOU). Freeze checks inside locks on all mutation methods. Thread-safe metrics. Bounded replay-detection window (10,000 request IDs via collections.deque). WAL checkpointing on the audit database.
Audit integrity. Append-only SQLite audit trail with WAL journal mode. Every decision recorded regardless of outcome. Tool execution failures recorded separately from governance approvals. Graceful handling of corrupted records (availability over crash). Parameterized SQL queries throughout.
Deferred items
Six findings are documented, accepted as deferrals, and tracked for resolution in future versions:
| Finding | Risk | Resolution path |
|---|---|---|
object.__setattr__ bypass | Python C-level slot access bypasses custom __setattr__ | Rust runtime (true type enforcement) |
| Module attribute replacement | risk_module.WEIGHTS = {...} replaces entire scoring table | Process-isolation boundary (AEGIS daemon) |
| Seal-token memory exposure | _seal_token readable via engine._seal_token | Rust runtime (private fields) |
| Silent evidence replacement | Corrupted JSON in audit records replaced with {} | RT-009 hash chaining (detects tampering) |
| Agent identity spoofing | No transport-layer authentication on agent_id | v0.2.0 RFC-0002 (mTLS, bearer tokens) |
| Parameter semantic analysis | Risk engine scores target string, not action parameters | v0.2.0 NLP/policy DSL |
All deferrals are mitigated by the AEGIS deployment model: the runtime operates inside a process-isolation boundary (the AEGIS daemon) where the attack surface for code-level manipulation is constrained by the daemon’s own security posture.
Methodology
Adversarial structure. Two independent Claude Opus 4.6 sessions operated as competing teams. The red team attempted to break the runtime by crafting adversarial inputs, exploiting edge cases, manipulating internal state, and finding evasion paths through the governance pipeline. The blue team fixed every vulnerability the red team found, then hardened surrounding code preemptively. Fixes were validated by re-running all existing tests plus new tests targeting the specific vulnerability. After each round, sessions switched roles. The previous blue team attacked the other session’s fixes; the previous red team defended.
This adversarial structure ensures that fixes are tested by a session that did not write the fix, reducing confirmation bias.
Traceability. Every security test is tagged with ATX-1 technique IDs and ATM-1 attack vectors via pytest markers:
@pytest.mark.atx1(technique_id="T10004")
@pytest.mark.atm1(attack_vector="AV-2")
def test_shell_metacharacter_detection(self, runtime):
...
Coverage is tracked programmatically in tests/security/coverage.py, which maintains the complete ATX-1/ATM-1 taxonomy and can generate coverage reports on demand.
Reproducibility
All tests run in under 2 seconds on commodity hardware:
353 passed in 1.40s
Zero external dependencies. Zero network calls. Zero mocks of core behavior. The runtime is stdlib-only Python, and the test suite exercises it end-to-end through the public API.
cd core-py
python -m pytest tests/ -v
Provenance
This report was generated from adversarial testing conducted on 2026-03-30 against aegis-core commit history. It originally lived at aegis-core/core-py/SECURITY_TESTING.md and was relocated to AEGIS Labs in 2026-04 because the narrative analysis is research output. The adversarial test cases themselves remain in aegis-core/core-py/tests/. The AEGIS governance runtime is developed by the AEGIS Initiative.