Runtime under test: aegis-core v0.1.0
Date: 2026-03-30
Methodology: Adversarial red/blue team testing across 9 rounds
Testers: Two independent AI sessions (Claude Opus 4.6) operating adversarially

This report previously lived at aegis-core/core-py/SECURITY_TESTING.md. It moved to AEGIS Labs because the narrative analysis of adversarial testing is research output and belongs in the labs repository. The test cases themselves remain committed in aegis-core/core-py/tests/. A redirect stub remains at the old path.

Summary

The AEGIS governance runtime underwent 9 rounds of adversarial security testing, conducted by two independent AI sessions operating as competing red and blue teams. One session attacked; the other defended. Then they switched. Each round surfaced new vulnerabilities, which were fixed and re-tested before the next round began.

MetricValue
Total tests at completion353
All passingYes
Total findings (across rounds)~50
Fixed in code30+
Deferred (architectural)6
Red/blue team rounds9
ATX-1 technique coverage25/25 applicable (100%)
ATM-1 attack vector coverage6/6 applicable (100%)
ATM-1 security properties5/5 covered
Source files audited11 (every Python file in the runtime)

What was tested

Every Python source file in the AEGIS governance runtime was audited for security vulnerabilities:

ModuleAttack surface
gateway.pyInput validation, shell metacharacter detection, sensitive path protection, replay prevention
decision_engine.pyTOCTOU races, capability/policy/risk pipeline integrity, information leakage
risk.pyScore evasion, threshold gaming, amplifier bypass, Unicode homoglyph evasion, command prefix evasion, module attribute replacement, private attribute bypass
policy_engine.pyFreeze/unseal bypass, timing attacks on seal tokens, TOCTOU races, policy condition exploitation
capability_registry.pyFreeze/unseal bypass, timing attacks, TOCTOU races, wildcard capability abuse
audit.pyRecord tampering, audit injection, availability attacks, corrupted data handling
tool_proxy.pyRecursive invocation loops, depth tracking bypass (sync and async), execution failure recording
protocol.pyJSON deserialization limits, payload size enforcement
runtime.pyComponent wiring, lifecycle management
exceptions.pyError message information leakage

ATX-1 technique coverage

ATX-1 defines 29 techniques across 10 tactics for attacks against AI governance systems. AEGIS Core has test coverage for every technique applicable at the runtime engine layer.

Fully covered (25 techniques)

TacticTechniquesTest count
TA001: Violate Authority BoundariesT1001, T1002, T10036
TA002: Exceed Operational ScopeT2001, T2002, T2003, T20048
TA003: Compromise System IntegrityT3001, T30025
TA005: Violate State IntegrityT5001, T5002, T50037
TA006: Abuse Resource AllocationT6001, T60024
TA007: Manipulate Agent InteractionsT7001, T7002, T70046
TA008: Establish or Modify PersistenceT8001, T800212
TA009: Evade Detection or OversightT9001, T90025
TA010: Act Beyond Governance InterpretationT10001, T10002, T10003, T1000428

Not applicable at the runtime layer (4 techniques)

TechniqueReasonWhere it belongs
T4001: Exfiltrate Context-Scoped DataRequires agent-to-external data flowsaegis-labs integration tests
T4002: Leak Cross-Session DataRequires persistent agent memoryaegis-labs integration tests
T4003: Cross-Domain Secret LeakageRequires multi-domain deploymentaegis-platform integration tests
T7003: Induce Cross-Agent Behavioral DriftRequires longitudinal multi-session testingaegis-labs integration tests

ATM-1 attack vector coverage

VectorDescriptionStatusTests
AV-1Protocol-level attacks (replay, injection)Covered4
AV-2Policy-layer attacks (evasion, bypass, tampering)Covered12
AV-3Identity & authentication attacksCovered5
AV-4Audit & logging attacksCovered5
AV-5Timing & side-channel attacksCovered2
AV-6Supply-chain & dependency attacksN/ADeployment layer
AV-7Distributed & coordinated attacksCovered5

Key hardening measures

Input validation & sanitization. Shell metacharacter detection, sensitive-path write protection (git hooks, shell init, CI/CD, Dockerfiles, agent instruction files), path-traversal normalization via posixpath.normpath, Unicode NFKC normalization to defeat homoglyph evasion, iterative URL percent-decoding, command-prefix stripping, JSON payload size limits.

Risk scoring model. 5-dimension composite scoring (capability sensitivity, action severity, target sensitivity, historical rate, behavioral anomaly). Asymmetric amplifier; fail-closed defaults; explanation sanitization to strip attacker-controlled HTML, quotes, newlines, and control characters from risk explanations.

Immutability & tamper resistance. MappingProxyType for scoring weight tables, immutable tuple for sensitive target patterns, __slots__ and custom __setattr__ on RiskEngine, freeze/unseal mechanism with UUID seal tokens on CapabilityRegistry and PolicyEngine, constant-time seal-token comparison via hmac.compare_digest.

Concurrency safety. Unified evaluation lock across capability + policy + risk stages (prevents TOCTOU). Freeze checks inside locks on all mutation methods. Thread-safe metrics. Bounded replay-detection window (10,000 request IDs via collections.deque). WAL checkpointing on the audit database.

Audit integrity. Append-only SQLite audit trail with WAL journal mode. Every decision recorded regardless of outcome. Tool execution failures recorded separately from governance approvals. Graceful handling of corrupted records (availability over crash). Parameterized SQL queries throughout.

Deferred items

Six findings are documented, accepted as deferrals, and tracked for resolution in future versions:

FindingRiskResolution path
object.__setattr__ bypassPython C-level slot access bypasses custom __setattr__Rust runtime (true type enforcement)
Module attribute replacementrisk_module.WEIGHTS = {...} replaces entire scoring tableProcess-isolation boundary (AEGIS daemon)
Seal-token memory exposure_seal_token readable via engine._seal_tokenRust runtime (private fields)
Silent evidence replacementCorrupted JSON in audit records replaced with {}RT-009 hash chaining (detects tampering)
Agent identity spoofingNo transport-layer authentication on agent_idv0.2.0 RFC-0002 (mTLS, bearer tokens)
Parameter semantic analysisRisk engine scores target string, not action parametersv0.2.0 NLP/policy DSL

All deferrals are mitigated by the AEGIS deployment model: the runtime operates inside a process-isolation boundary (the AEGIS daemon) where the attack surface for code-level manipulation is constrained by the daemon’s own security posture.

Methodology

Adversarial structure. Two independent Claude Opus 4.6 sessions operated as competing teams. The red team attempted to break the runtime by crafting adversarial inputs, exploiting edge cases, manipulating internal state, and finding evasion paths through the governance pipeline. The blue team fixed every vulnerability the red team found, then hardened surrounding code preemptively. Fixes were validated by re-running all existing tests plus new tests targeting the specific vulnerability. After each round, sessions switched roles. The previous blue team attacked the other session’s fixes; the previous red team defended.

This adversarial structure ensures that fixes are tested by a session that did not write the fix, reducing confirmation bias.

Traceability. Every security test is tagged with ATX-1 technique IDs and ATM-1 attack vectors via pytest markers:

@pytest.mark.atx1(technique_id="T10004")
@pytest.mark.atm1(attack_vector="AV-2")
def test_shell_metacharacter_detection(self, runtime):
    ...

Coverage is tracked programmatically in tests/security/coverage.py, which maintains the complete ATX-1/ATM-1 taxonomy and can generate coverage reports on demand.

Reproducibility

All tests run in under 2 seconds on commodity hardware:

353 passed in 1.40s

Zero external dependencies. Zero network calls. Zero mocks of core behavior. The runtime is stdlib-only Python, and the test suite exercises it end-to-end through the public API.

cd core-py
python -m pytest tests/ -v

Provenance

This report was generated from adversarial testing conducted on 2026-03-30 against aegis-core commit history. It originally lived at aegis-core/core-py/SECURITY_TESTING.md and was relocated to AEGIS Labs in 2026-04 because the narrative analysis is research output. The adversarial test cases themselves remain in aegis-core/core-py/tests/. The AEGIS governance runtime is developed by the AEGIS Initiative.