§ 01 / SummaryWhat this document is
This document lists every security property AOSIQ enforces, with references to the files, migrations, and audit events that implement each property. It also lists, with equal specificity, the threats AOSIQ does not mitigate.
The honest framing matters. A threat model that claims comprehensive coverage is either overpromising or has stopped thinking about new attack vectors. Real security writing acknowledges what it doesn't cover, why, and what compensating control the operator is expected to provide.
Read alongside the file references table at the bottom — every claim made here can be verified against the codebase. Nothing in this document is aspirational. Items still in development are not included; they appear in the public roadmap separately.
§ 02 / CoverageWhat's in scope
Six classes of threat the runtime materially reduces. Each is implemented as a runtime invariant — not as an operator best practice, not as documentation, not as a configuration default.
Most properties apply to all actor types. Capability narrowing, the audit chain, the approval gate, cost ceilings, and backend constraints govern reasoning agents and deterministic actors uniformly. Prompt-injection defenses are reasoning-specific by definition — deterministic actors don't reason and therefore can't be prompt-injected. Where a property is reasoning-only, the bullet below says so.
-
Capability narrowing prevents actor-to-actor privilege
escalation. A child actor's grants are the
intersection of the parent's grants and the child's
request. Empty intersections raise
InsufficientAuthorityat spawn time, before any code runs. - Per-session audit chain detects PostgreSQL-write tampering when the audit anchor store is independently credentialed. Mid-chain row modifications break the SHA-256 linkage and fail verification.
-
Mandatory approval prevents autonomous destructive
actions when tools are registered with
reversible=False. The runtime cannot execute the tool; it inserts a pending review row and blocks the actor until an operator approves the specific(tool, args_hash)pair. - Cost ceilings prevent runaway spend. For reasoning agents this means LLM token cost; for deterministic actors it means compute and external API consumption. Either way, configured per-session ceilings raise an exception before the operation would push usage over budget — not after the bill arrives.
-
Per-class backend constraints prevent unintended
data egress. Applies to reasoning agents.
An agent class registered with
allowed_backends={"ollama"}cannot route to Anthropic, OpenAI, or any other provider regardless of operator misconfiguration. Deterministic actors have no LLM backend so this property doesn't apply to them. - Untrusted-content delimiters and pattern detection raise the floor against direct prompt injection. Applies to reasoning agents only. Tool results are wrapped with structural markers and scanned for injection-shaped patterns; detections prepend warnings and write audit events without breaking legitimate use.
§ 03 / LimitsWhat's explicitly out of scope
Six classes of threat AOSIQ does not mitigate today. Each is documented because operators evaluating the runtime need to understand what compensating control they remain responsible for.
- Fully-compromised infrastructure operator. An attacker holding both PostgreSQL write access and audit-anchor store credentials can rewrite the chain and the anchors together. Mitigation: route audit anchors to an external append-only service (object lock, immutable storage tier).
- Reasoning-redirection injection. Adversarial content designed to make the LLM reach a specific conclusion — rather than call a specific tool — produces structurally-valid proposals with proper citations that just arrive at the wrong answer. The judge-model defense pattern is the standard mitigation; deferred to a future release.
- KB content poisoning at ingest time. Documents embedded in the knowledge base are trusted at ingest; AOSIQ does not scan for prompt-injection patterns or PII before embedding. Operators are the trust boundary for what enters their corpora.
- Multi-turn injection drifting across many tool results. An adversary placing content across a long sequence of tool calls, gradually steering the agent's reasoning, evades single-result pattern detection. No good general defense exists; explicit deferral.
- Adversarially-crafted patterns that evade the regex detectors. The injection pattern set is best-effort, not exhaustive. A determined adversary can construct phrasings that look benign to the detector and still land as injection in the LLM's context.
- LLM hallucination of correct-looking-but-wrong tool arguments where capabilities permit. The seven-layer evidence stack catches most cases but cannot prove absence. Operators remain responsible for treating proposals as advisory, not authoritative, in scenarios where errors are costly.
§ 04 / ScenariosThreat scenarios with mitigations
Each scenario describes a specific attack class the runtime encounters in production deployments. The mitigation is the concrete code path or audit event that addresses the threat; the residual is what an operator should still watch for.
An attacker with network access to the AOSIQ API attempts to
POST /api/v1/agents with capability grants
they shouldn't possess, then directs the resulting agent
to perform actions in the operator's environment.
Bearer-token authentication on every API route via
aos/server/auth.py. Three modes (enforced,
warn, disabled); production
deployments require AOS_AUTH_MODE=enforced.
The warn mode adds an explicit
X-AOS-Auth-Warning response header so
misconfiguration surfaces in logs. Capability grants
requested at spawn are clamped to the caller's authorized
token claims; a caller cannot mint a token broader than
its own.
The single bearer-token model does not provide per-caller audit attribution. Multi-key auth with rotation is on the roadmap. Until then, deployments with multiple upstream callers should provision per-deployment instances rather than sharing a single token.
An MCP server, file-system content, or network response returns content designed to make the LLM emit an action the operator did not authorize. Applies to reasoning agents only. Deterministic actors cannot be redirected by tool-result content because they don't reason about it — they execute coded logic.
Three layers compose. Capability narrowing
limits the action surface: an actor without
delete_records in its capability token cannot
call it regardless of what the LLM is talked into.
Approval gating on irreversible tools
ensures destructive actions require explicit operator
sign-off at the moment of execution. These two protections
apply to all actors as defense in depth against many
attack classes, not just this one.
Untrusted-content delimiters
(<untrusted_tool_output> tags) plus
pattern detection raise warnings and write
APP_INJECTION_PATTERN_DETECTED audit events
when tool output looks injection-shaped — reasoning-agents
only.
Reasoning redirection (steering the LLM to a wrong conclusion within its granted capability set) is not detected. Critical workflows should treat agent proposals as advisory and route final decisions through human review.
An agent waiting hours on human approval for an irreversible action holds a JWT capability token that expires before approval arrives. The agent fails at resume time and the workflow stalls.
Token TTL is configurable via
AOS_TOKEN_TTL_SECONDS (default 3600).
Long-approval-cycle agent classes should be deployed
with longer TTLs. The scheduler's heartbeat-and-reaper
flow detects expired-token agents and surfaces them as
recoverable rather than terminal.
No automatic token refresh on dispatch. Agents whose TTL expires mid-execution require operator intervention to re-mint. Refresh-on-dispatch is a roadmap item.
running state
Limited
A scheduler worker crashes after claiming an agent but
before completing its step. Without cleanup, the agent's
row in agent_processes stays in
running indefinitely and is not re-queued.
Worker heartbeat (migration 015) records
last_heartbeat on every agent claim. An
orphan reaper loop detects agents whose worker has gone
silent past a threshold, transitions them back to
runnable, and re-queues them. Combined with
the three-part composite checkpoint, the agent resumes
cleanly from its last successful step. Tool-call
idempotency on the agent's tool implementations is the
expected compensating control.
At-least-once tool execution is possible if the worker crashes after a tool call started but before the result was recorded. Agent tool implementations are expected to be idempotent; this is documented as an integration requirement.
The approval gate is only as strong as the operator reviewing the prompt. An operator approving rapidly, under pressure, or without reading the proposed arguments can authorize a wrong or harmful action.
The dashboard renders proposed action, full arguments,
blast radius classification, evidence trail, and
capability scope at the moment of decision. Approvals
are bound to the specific
SHA-256(canonical_json(args)); modifying
the args after approval invalidates the approval.
Audit chain captures the approving operator's token
jti for forensic attribution.
Approval fatigue is an operational risk that grows with
volume. Operators should configure capability templates
tightly enough that high-volume actions are
reversible=true and only genuinely
destructive actions reach the queue.
MinIO or the configured S3-compatible anchor store becomes unreachable. Without the operator noticing, the tamper-evidence guarantee silently degrades — new anchors stop being written, and an attacker with PG write access has a longer window to rewrite chain rows before detection.
The runtime tracks anchor write success and failure
counters at the process level; both are surfaced in
/api/v1/health as
audit_anchors. Operators are expected to
alert on any failure count above zero. Anchor write
attempts are non-blocking — chain integrity continues
even if anchoring fails — but the gap is visible.
Operators who do not alert on
audit_anchors.failures can run for arbitrary
periods with degraded tamper evidence. The metric is
available; operational discipline to monitor it is the
compensating control.
§ 05 / RoadmapWhat's coming next
The threat model is a living document. The mitigations below are explicitly out of scope today and on the active roadmap. Each is sized, scoped, and has a target release. Threat scenarios graduate from this list to the in-scope section as mitigations ship.
High-severity proposals routed to a second LLM call with only the proposal and evidence — no original task, no tool history. Judge disagreement surfaces to operator. Closes the largest open category in the prompt-injection threat surface.
Pre-embedding scan for prompt-injection patterns, secrets, PII, and known-malicious content. Operators see a flagged list and approve before ingest. Closes the KB-poisoning gap documented in § 03.
Per-caller bearer tokens with independent audit attribution and rotation without downtime. Resolves the residual exposure noted in scenario S–01 for deployments with multiple upstream callers.
Federated identity for API callers, mutual-TLS as an alternative bearer-token mechanism. Targets enterprise deployment shapes where bearer tokens alone are insufficient for the security review process.
Published p50/p99 latency numbers, soak test results, capacity planning guidance under realistic concurrency. Required for first-customer production deployment at scale.
Container-isolated Python execution with no network, ephemeral filesystem, and resource limits. Replaces over-broad bash grants with a tighter primitive. Reversible by construction; no approval gate required.
Items marked deferred are intentional gaps where no good general defense exists today (multi-turn injection drift, adversarial pattern evasion). These remain operator-responsibility until the security research community converges on accepted mitigations.
§ 06 / OperatorOperator responsibilities
Several mitigations require operator action to remain effective. These are not features the runtime enforces — they are conditions the runtime depends on.
-
Run with
AOS_AUTH_MODE=enforcedin production. Thewarnmode is for development. TheX-AOS-Auth-Warningresponse header makes misconfiguration visible in logs and reverse-proxy traces. - Credential the audit anchor store independently from PostgreSQL. Tamper evidence collapses if both stores are reachable from the same compromise. Anchor credentials should live in a separate secrets path with separate rotation.
- Treat agent proposals as advisory in critical workflows. The seven-layer evidence stack catches most hallucination cases; it cannot prove absence. Production deployments where errors are costly should keep humans authoritative on outcomes.
- Vet KB ingest sources. Documents added to the knowledge base are trusted by the agents that retrieve them. Ingest from sources you control, or scan documents externally before embedding.
-
Alert on anchor write failures. The
/api/v1/healthendpoint surfacesaudit_anchors.failures. Any non-zero value indicates the tamper-evidence guarantee is degrading. - Configure capability templates restrictively by default. The runtime cannot prevent over-broad grants if the operator authorizes them. Build agent classes with the smallest capability set that completes the task.
§ 07 / ReferencesFile and migration references
Every claim in this document maps to specific files, migrations, or audit event types in the codebase. Operators with source access can verify each property directly.
| Reference | Property |
|---|---|
| aos/api/syscalls.py | Single policy enforcement boundary; capability check, audit append, approval gate, cost ledger record fire here |
| aos/security/capability.py | JWT capability token mint, verify, delegate (intersection-narrowing) |
| aos/audit/engine.py | Per-session SHA-256 hash chain; anchor write to MinIO/S3 |
| aos/server/auth.py | Three-mode bearer-token authentication scaffold |
| aos/kernel/scheduler.py | Worker heartbeat, orphan reaper, state-machine validated transitions |
| migration 012 | Per-session audit chain with chain_seq and pg_advisory_xact_lock |
| migration 014 | State-machine validation function; invalid transitions match zero rows |
| migration 015 | Worker heartbeat columns and orphan-detection index |
| APP_INJECTION_PATTERN_DETECTED | Audit event written when tool output matches injection-shaped patterns |
| HUMAN_APPROVAL_REQUESTED | Audit event written when an agent attempts an irreversible tool |
| APP_AGENT_ABANDONED_UNGROUNDED | Terminal state when an agent refuses to investigate before proposing |
| /api/v1/health | Returns auth_mode, audit_anchors success/failure counts, scheduler status |
§ 08 / LogChangelog
A maintained threat model has a changelog. New mitigations, new threat scenarios, and reframings are recorded here.
-
May 2026v0.7.1 — clarified that most governance properties apply to all actor types, not just reasoning agents. Prompt-injection defenses explicitly marked as reasoning-only. Mitigation language broadened in §§ 02 and 04 to acknowledge that capability narrowing, approval gating, audit chain, and cost ceilings govern deterministic actors and reasoning agents uniformly.
-
May 2026v0.7.0 — added scenario S–02 covering prompt-injection in tool outputs; documented untrusted-content delimiter mitigation and
APP_INJECTION_PATTERN_DETECTEDaudit event. -
Apr 2026v0.6.0 — clarified S–06 (audit anchor degradation); added
audit_anchorscounters to/api/v1/health; documented operator-monitoring expectation. -
Mar 2026v0.5.0 — added scenario S–05 (operator approval mistakes); bound approvals to
SHA-256(canonical_json(args))for argument integrity. -
Feb 2026v0.4.0 — orphan recovery via heartbeat and reaper (migration 015) added; S–04 documented.
-
Jan 2026v0.3.0 — per-session audit chains (migration 012); chain serialization moved from global to per-session via
pg_advisory_xact_lock.
Got specific questions about your environment?
The earliest engagements include a security-team walkthrough of the runtime against your specific deployment shape. If something in this document doesn't address your threat model, that's the conversation to have.