aegis-core Adversarial Testing (9 Rounds)

Runtime under test: aegis-core v0.1.0
Date: 2026-03-30
Methodology: Adversarial red/blue team testing across 9 rounds
Testers: Two independent AI sessions (Claude Opus 4.6) operating adversarially

This report previously lived at aegis-core/core-py/SECURITY_TESTING.md. It moved to AEGIS Labs because the narrative analysis of adversarial testing is research output and belongs in the labs repository. The test cases themselves remain committed in aegis-core/core-py/tests/. A redirect stub remains at the old path.

Summary

The AEGIS governance runtime underwent 9 rounds of adversarial security testing, conducted by two independent AI sessions operating as competing red and blue teams. One session attacked; the other defended. Then they switched. Each round surfaced new vulnerabilities, which were fixed and re-tested before the next round began.

Metric	Value
Total tests at completion	353
All passing	Yes
Total findings (across rounds)	~50
Fixed in code	30+
Deferred (architectural)	6
Red/blue team rounds	9
ATX-1 technique coverage	25/25 applicable (100%)
ATM-1 attack vector coverage	6/6 applicable (100%)
ATM-1 security properties	5/5 covered
Source files audited	11 (every Python file in the runtime)

What was tested

Every Python source file in the AEGIS governance runtime was audited for security vulnerabilities:

Module	Attack surface
`gateway.py`	Input validation, shell metacharacter detection, sensitive path protection, replay prevention
`decision_engine.py`	TOCTOU races, capability/policy/risk pipeline integrity, information leakage
`risk.py`	Score evasion, threshold gaming, amplifier bypass, Unicode homoglyph evasion, command prefix evasion, module attribute replacement, private attribute bypass
`policy_engine.py`	Freeze/unseal bypass, timing attacks on seal tokens, TOCTOU races, policy condition exploitation
`capability_registry.py`	Freeze/unseal bypass, timing attacks, TOCTOU races, wildcard capability abuse
`audit.py`	Record tampering, audit injection, availability attacks, corrupted data handling
`tool_proxy.py`	Recursive invocation loops, depth tracking bypass (sync and async), execution failure recording
`protocol.py`	JSON deserialization limits, payload size enforcement
`runtime.py`	Component wiring, lifecycle management
`exceptions.py`	Error message information leakage

ATX-1 technique coverage

ATX-1 defines 29 techniques across 10 tactics for attacks against AI governance systems. AEGIS Core has test coverage for every technique applicable at the runtime engine layer.

Fully covered (25 techniques)

Tactic	Techniques	Test count
TA001: Violate Authority Boundaries	T1001, T1002, T1003	6
TA002: Exceed Operational Scope	T2001, T2002, T2003, T2004	8
TA003: Compromise System Integrity	T3001, T3002	5
TA005: Violate State Integrity	T5001, T5002, T5003	7
TA006: Abuse Resource Allocation	T6001, T6002	4
TA007: Manipulate Agent Interactions	T7001, T7002, T7004	6
TA008: Establish or Modify Persistence	T8001, T8002	12
TA009: Evade Detection or Oversight	T9001, T9002	5
TA010: Act Beyond Governance Interpretation	T10001, T10002, T10003, T10004	28

Not applicable at the runtime layer (4 techniques)

Technique	Reason	Where it belongs
T4001: Exfiltrate Context-Scoped Data	Requires agent-to-external data flows	aegis-labs integration tests
T4002: Leak Cross-Session Data	Requires persistent agent memory	aegis-labs integration tests
T4003: Cross-Domain Secret Leakage	Requires multi-domain deployment	aegis-platform integration tests
T7003: Induce Cross-Agent Behavioral Drift	Requires longitudinal multi-session testing	aegis-labs integration tests

ATM-1 attack vector coverage

Vector	Description	Status	Tests
AV-1	Protocol-level attacks (replay, injection)	Covered	4
AV-2	Policy-layer attacks (evasion, bypass, tampering)	Covered	12
AV-3	Identity & authentication attacks	Covered	5
AV-4	Audit & logging attacks	Covered	5
AV-5	Timing & side-channel attacks	Covered	2
AV-6	Supply-chain & dependency attacks	N/A	Deployment layer
AV-7	Distributed & coordinated attacks	Covered	5

Key hardening measures

Input validation & sanitization. Shell metacharacter detection, sensitive-path write protection (git hooks, shell init, CI/CD, Dockerfiles, agent instruction files), path-traversal normalization via posixpath.normpath, Unicode NFKC normalization to defeat homoglyph evasion, iterative URL percent-decoding, command-prefix stripping, JSON payload size limits.

Risk scoring model. 5-dimension composite scoring (capability sensitivity, action severity, target sensitivity, historical rate, behavioral anomaly). Asymmetric amplifier; fail-closed defaults; explanation sanitization to strip attacker-controlled HTML, quotes, newlines, and control characters from risk explanations.

Immutability & tamper resistance. MappingProxyType for scoring weight tables, immutable tuple for sensitive target patterns, __slots__ and custom __setattr__ on RiskEngine, freeze/unseal mechanism with UUID seal tokens on CapabilityRegistry and PolicyEngine, constant-time seal-token comparison via hmac.compare_digest.

Concurrency safety. Unified evaluation lock across capability + policy + risk stages (prevents TOCTOU). Freeze checks inside locks on all mutation methods. Thread-safe metrics. Bounded replay-detection window (10,000 request IDs via collections.deque). WAL checkpointing on the audit database.

Audit integrity. Append-only SQLite audit trail with WAL journal mode. Every decision recorded regardless of outcome. Tool execution failures recorded separately from governance approvals. Graceful handling of corrupted records (availability over crash). Parameterized SQL queries throughout.

Deferred items

Six findings are documented, accepted as deferrals, and tracked for resolution in future versions:

Finding	Risk	Resolution path
`object.__setattr__` bypass	Python C-level slot access bypasses custom `__setattr__`	Rust runtime (true type enforcement)
Module attribute replacement	`risk_module.WEIGHTS = {...}` replaces entire scoring table	Process-isolation boundary (AEGIS daemon)
Seal-token memory exposure	`_seal_token` readable via `engine._seal_token`	Rust runtime (private fields)
Silent evidence replacement	Corrupted JSON in audit records replaced with `{}`	RT-009 hash chaining (detects tampering)
Agent identity spoofing	No transport-layer authentication on `agent_id`	v0.2.0 RFC-0002 (mTLS, bearer tokens)
Parameter semantic analysis	Risk engine scores target string, not action parameters	v0.2.0 NLP/policy DSL

All deferrals are mitigated by the AEGIS deployment model: the runtime operates inside a process-isolation boundary (the AEGIS daemon) where the attack surface for code-level manipulation is constrained by the daemon’s own security posture.

Methodology

Adversarial structure. Two independent Claude Opus 4.6 sessions operated as competing teams. The red team attempted to break the runtime by crafting adversarial inputs, exploiting edge cases, manipulating internal state, and finding evasion paths through the governance pipeline. The blue team fixed every vulnerability the red team found, then hardened surrounding code preemptively. Fixes were validated by re-running all existing tests plus new tests targeting the specific vulnerability. After each round, sessions switched roles. The previous blue team attacked the other session’s fixes; the previous red team defended.

This adversarial structure ensures that fixes are tested by a session that did not write the fix, reducing confirmation bias.

Traceability. Every security test is tagged with ATX-1 technique IDs and ATM-1 attack vectors via pytest markers:

@pytest.mark.atx1(technique_id="T10004")
@pytest.mark.atm1(attack_vector="AV-2")
def test_shell_metacharacter_detection(self, runtime):
    ...

Coverage is tracked programmatically in tests/security/coverage.py, which maintains the complete ATX-1/ATM-1 taxonomy and can generate coverage reports on demand.

Reproducibility

All tests run in under 2 seconds on commodity hardware:

353 passed in 1.40s

Zero external dependencies. Zero network calls. Zero mocks of core behavior. The runtime is stdlib-only Python, and the test suite exercises it end-to-end through the public API.

cd core-py
python -m pytest tests/ -v

Provenance

This report was generated from adversarial testing conducted on 2026-03-30 against aegis-core commit history. It originally lived at aegis-core/core-py/SECURITY_TESTING.md and was relocated to AEGIS Labs in 2026-04 because the narrative analysis is research output. The adversarial test cases themselves remain in aegis-core/core-py/tests/. The AEGIS governance runtime is developed by the AEGIS Initiative.