Author: Ken Tannenbaum, AEGIS Initiative
Date: 2026-04-09
Status: Published
License: CC-BY-SA-4.0

Abstract

For most of the last decade, AI safety has meant making models behave better. Reinforcement learning from human feedback, Constitutional AI training, instruction tuning, output filters — these methods shape what a language model says when asked a question. They are good at what they do, and they have made conversational AI meaningfully safer to talk to.

But autonomous agents do not just talk. They act. They call APIs, modify files, command actuators, send emails, transfer funds, and coordinate with other agents across networks. The action layer is not the same as the conversation layer. An agent that is well-aligned in language can still take catastrophic actions in code — and in production deployments, this is already happening.

This paper argues that the action boundary is the correct place to intervene, and walks through the four enforcement points, the five root causes of action-boundary failure, the Anderson security properties applied to agentic systems, and how AEGIS fits into a Plan / Build / Operate lifecycle.

Contents

Format

A narrative essay aimed at general readers. Argues why the action boundary matters. The technical reference implementation is the companion paper, which maps the same ideas to concrete architectural mechanisms in the AEGIS runtime.

Artifacts

The full essay is available as styled HTML and as a print-ready PDF in the aegis-labs repository (private). A public-distribution copy will be linked here when the labs site reaches v1.0.

Relation to AEGIS evidence

The “three numbers from the AEGIS edge laboratory” referenced in §1 are drawn from Round 1 — Edge Deployment Evaluation: the bare-metal benchmark figures (15.1 RPS adversarial throughput at 110 ms p95) and the 100% interception rate observed during AEGIS activation in Phase 2 of the multi-agent exercise.