What we protect against. And what we don't.

AOSIQ provides specific, enforceable security properties — and is explicit about what it doesn't cover. Security teams reviewing the runtime should be able to read this document and decide either way without running into marketing fog.

Document
v0.7.2
Last updated
May 2026
Reading time
≈ 12 min
Audience
Security teams

§ 01 / SummaryWhat this document is

This document lists every security property AOSIQ enforces, with references to the files, migrations, and audit events that implement each property. It also lists, with equal specificity, the threats AOSIQ does not mitigate.

The honest framing matters. A threat model that claims comprehensive coverage is either overpromising or has stopped thinking about new attack vectors. Real security writing acknowledges what it doesn't cover, why, and what compensating control the operator is expected to provide.

Read alongside the file references table at the bottom — every claim made here can be verified against the codebase. Nothing in this document is aspirational. Items still in development are not included; they appear in the public roadmap separately.

§ 02 / CoverageWhat's in scope

Six classes of threat the runtime materially reduces. Each is implemented as a runtime invariant — not as an operator best practice, not as documentation, not as a configuration default.

Most properties apply to all actor types. Capability narrowing, the audit chain, the approval gate, cost ceilings, and backend constraints govern reasoning agents and deterministic actors uniformly. Prompt-injection defenses are reasoning-specific by definition — deterministic actors don't reason and therefore can't be prompt-injected. Where a property is reasoning-only, the bullet below says so.

Mitigated by runtime invariants
  • Capability narrowing prevents actor-to-actor privilege escalation. A child actor's grants are the intersection of the parent's grants and the child's request. Empty intersections raise InsufficientAuthority at spawn time, before any code runs.
  • Per-session audit chain detects PostgreSQL-write tampering when the audit anchor store is independently credentialed. Mid-chain row modifications break the SHA-256 linkage and fail verification.
  • Mandatory approval prevents autonomous destructive actions when tools are registered with reversible=False. The runtime cannot execute the tool; it inserts a pending review row and blocks the actor until an operator approves the specific (tool, args_hash) pair.
  • Cost ceilings prevent runaway spend. For reasoning agents this means LLM token cost; for deterministic actors it means compute and external API consumption. Either way, configured per-session ceilings raise an exception before the operation would push usage over budget — not after the bill arrives.
  • Per-class backend constraints prevent unintended data egress. Applies to reasoning agents. An agent class registered with allowed_backends={"ollama"} cannot route to Anthropic, OpenAI, or any other provider regardless of operator misconfiguration. Deterministic actors have no LLM backend so this property doesn't apply to them.
  • Untrusted-content delimiters and pattern detection raise the floor against direct prompt injection. Applies to reasoning agents only. Tool results are wrapped with structural markers and scanned for injection-shaped patterns; detections prepend warnings and write audit events without breaking legitimate use.
  • Continuous red-team probing with a published, honest defense rate. A versioned corpus of adversarial probes — jailbreaks, indirect injection, multi-turn smuggling, emergent tool misuse, and capability/role boundary violations — runs against the runtime's real structural defenses on every change and nightly. The deterministic boundaries (capability, role, approval) defend 100% of their probes; the closed-set injection detector and the open-set categories are reported at their actual rate, not rounded up. We claim "we defend against the patterns in our corpus, with continuous expansion" — never "we defend against prompt injection" in the abstract. Novel jailbreaks that target the model's own refusal remain out of scope by design (below).

§ 03 / LimitsWhat's explicitly out of scope

Six classes of threat AOSIQ does not mitigate today. Each is documented because operators evaluating the runtime need to understand what compensating control they remain responsible for.

Not covered — operator responsibility
  • Fully-compromised infrastructure operator. An attacker holding both PostgreSQL write access and audit-anchor store credentials can rewrite the chain and the anchors together. Mitigation: route audit anchors to an external append-only service (object lock, immutable storage tier).
  • Reasoning-redirection injection. Adversarial content designed to make the LLM reach a specific conclusion — rather than call a specific tool — produces structurally-valid proposals with proper citations that just arrive at the wrong answer. Two structural surfaces now ship: an asynchronous LLM classifier at ingest time flags prose-form injection the regex layer cannot judge, and an entailment judge on every proposal verifies that the cited evidence actually supports the claim before the proposal is recorded. For high- and critical-risk proposals the judge is a synchronous gate that refuses emission on entails-false. The judge model is itself an LLM, so the verdict inherits the underlying model's accuracy bound — but the structural pipeline is in place rather than deferred.
  • KB content poisoning at ingest time. Closed in two stages. The first sprint added a synchronous regex pre-filter on every chunk that DocumentStore.add_documents commits, with operator-configurable warn or reject policy, audit events per flagged chunk, and an inline warning at search time when a flagged chunk is returned. The follow-on kb-defense sprint added an asynchronous LLM classifier that stamps a supplementary verdict onto the same metadata field for prose-form injection the regex layer cannot reach, and a per-chunk operator quarantine surface (bearer-authed API endpoint plus per-chunk-CSRF dashboard button) that hides flagged chunks from default search until released.
  • Multi-turn injection drifting across many tool results. An adversary placing content across a long sequence of tool calls, gradually steering the agent's reasoning, evades single-result pattern detection. No good general defense exists; explicit deferral.
  • Adversarially-crafted patterns that evade the regex detectors. The injection pattern set is best-effort, not exhaustive. A determined adversary can construct phrasings that look benign to the detector and still land as injection in the LLM's context.
  • LLM hallucination of correct-looking-but-wrong tool arguments where capabilities permit. The seven-layer evidence stack catches most cases but cannot prove absence. Operators remain responsible for treating proposals as advisory, not authoritative, in scenarios where errors are costly.

§ 04 / ScenariosThreat scenarios with mitigations

Each scenario describes a specific attack class the runtime encounters in production deployments. The mitigation is the concrete code path or audit event that addresses the threat; the residual is what an operator should still watch for.

Critical
Direct path to compromise if exploited and unmitigated. Compensating controls are the operator's responsibility and should be considered required.
Significant
Material risk to integrity or availability. Mitigation reduces probability or blast radius; residual exposure remains and is documented.
Limited
Operationally bounded. Mitigation is structural; residual is recoverable through standard operator response.
S–01 Untrusted HTTP caller spawns agents with arbitrary tool grants Critical

An attacker with network access to the AOSIQ API attempts to POST /api/v1/agents with capability grants they shouldn't possess, then directs the resulting agent to perform actions in the operator's environment.

Bearer-token authentication on every API route via aos/server/auth.py. Three modes (enforced, warn, disabled); production deployments require AOS_AUTH_MODE=enforced. The warn mode adds an explicit X-AOS-Auth-Warning response header so misconfiguration surfaces in logs. Capability grants requested at spawn are clamped to the caller's authorized token claims; a caller cannot mint a token broader than its own.

The single bearer-token model does not provide per-caller audit attribution. Multi-key auth with rotation is on the roadmap. Until then, deployments with multiple upstream callers should provision per-deployment instances rather than sharing a single token.

S–02 Compromised tool result tampers with downstream state Significant

An MCP server, file-system content, or network response returns content designed to make the LLM emit an action the operator did not authorize. Applies to reasoning agents only. Deterministic actors cannot be redirected by tool-result content because they don't reason about it — they execute coded logic.

Three layers compose. Capability narrowing limits the action surface: an actor without delete_records in its capability token cannot call it regardless of what the LLM is talked into. Approval gating on irreversible tools ensures destructive actions require explicit operator sign-off at the moment of execution. These two protections apply to all actors as defense in depth against many attack classes, not just this one. The approval policy that drives this gating can itself be externalized to a signed, schema-validated artifact: changing it then requires a cryptographic signer key rather than a code deploy, and every change leaves a tamper-evident audit record — policy as a pre-committed, independently-auditable rule rather than a buried constant. Untrusted-content delimiters (<untrusted_tool_output> tags) plus pattern detection raise warnings and write APP_INJECTION_PATTERN_DETECTED audit events when tool output looks injection-shaped — reasoning-agents only.

Reasoning redirection (steering the LLM to a wrong conclusion within its granted capability set) is not detected. Critical workflows should treat agent proposals as advisory and route final decisions through human review.

S–03 Long-running agent's capability token expires mid-session Limited

An agent waiting hours on human approval for an irreversible action holds a JWT capability token that expires before approval arrives. The agent fails at resume time and the workflow stalls.

Token TTL is configurable via AOS_TOKEN_TTL_SECONDS (default 3600). Long-approval-cycle agent classes should be deployed with longer TTLs. The scheduler's heartbeat-and-reaper flow detects expired-token agents and surfaces them as recoverable rather than terminal.

No automatic token refresh on dispatch. Agents whose TTL expires mid-execution require operator intervention to re-mint. Refresh-on-dispatch is a roadmap item.

S–04 Worker crash leaves agents stuck in running state Limited

A scheduler worker crashes after claiming an agent but before completing its step. Without cleanup, the agent's row in agent_processes stays in running indefinitely and is not re-queued.

Worker heartbeat (migration 015) records last_heartbeat on every agent claim. An orphan reaper loop detects agents whose worker has gone silent past a threshold, transitions them back to runnable, and re-queues them. Combined with the three-part composite checkpoint, the agent resumes cleanly from its last successful step. Tool-call idempotency on the agent's tool implementations is the expected compensating control.

At-least-once tool execution is possible if the worker crashes after a tool call started but before the result was recorded. Agent tool implementations are expected to be idempotent; this is documented as an integration requirement.

S–05 Operator approves the wrong action Significant

The approval gate is only as strong as the operator reviewing the prompt. An operator approving rapidly, under pressure, or without reading the proposed arguments can authorize a wrong or harmful action.

The dashboard renders proposed action, full arguments, blast radius classification, evidence trail, and capability scope at the moment of decision. Approvals are bound to the specific SHA-256(canonical_json(args)); modifying the args after approval invalidates the approval. Audit chain captures the approving operator's token jti for forensic attribution.

Approval fatigue is an operational risk that grows with volume. Operators should configure capability templates tightly enough that high-volume actions are reversible=true and only genuinely destructive actions reach the queue.

S–06 Audit anchor store goes silent Significant

MinIO or the configured S3-compatible anchor store becomes unreachable. Without the operator noticing, the tamper-evidence guarantee silently degrades — new anchors stop being written, and an attacker with PG write access has a longer window to rewrite chain rows before detection.

The runtime tracks anchor write success and failure counters at the process level; both are surfaced in /api/v1/health as audit_anchors. Operators are expected to alert on any failure count above zero. Anchor write attempts are non-blocking — chain integrity continues even if anchoring fails — but the gap is visible.

Operators who do not alert on audit_anchors.failures can run for arbitrary periods with degraded tamper evidence. The metric is available; operational discipline to monitor it is the compensating control.

S–07 Approval gate bypassed at the operator boundary Significant

Operator-facing surfaces around the approval gate are attacked in six distinct ways: the dashboard router accepts unauthenticated form posts in enforced mode (asymmetric coverage with the API router); two operators race to approve the same review and both succeed, racing one of their notes out of the record; the operator approves without a written note so post-incident review has no context; the capability token underlying the review has already expired so the approval can never take effect; a malicious cross-origin form post mints an approval the operator did not actually click. Each axis on its own is a documented operator-trust hole.

Migration 022 (approval-gate-hardening sprint) closes the cluster: dashboard router carries Depends(verify_api_key) symmetric with the API router. The shared aos/kernel/approvals.py primitive uses SELECT … FOR UPDATE OF r and an UPDATE that re-checks status='pending', so two operators racing the path cannot both succeed. The reviewer-note textarea is required at both the HTML form layer (required minlength=1) and the application primitive (ReviewerNoteRequired → HTTP 422). Capability token TTL is checked at approve-time and surfaced in a server-rendered + JS-refreshed countdown with amber/red thresholds. Dashboard form posts validate a per-form CSRF token bound to (agent_id, operator_id, target_path, expiry) — tokens minted for /approve cannot be replayed against /reject or /rescind. AOS_AUTH_MODE default flipped from warn to enforced; warn mode refuses to start on a non-localhost bind unless AOS_AUTH_MODE_ALLOW_REMOTE_WARN=1.

Per-operator capability attribution still rides on a shared bearer token — multiple operators are indistinguishable in the audit record. Multi-key auth with per-identity capabilities is on the roadmap; deployments needing per-operator audit today provision per-operator instances rather than sharing a single token.

S–08 Tool registry runtime mutation disables the approval gate Significant

An attacker (or careless operator) with an API key calls POST /api/v1/tools to flip spawn_agent.reversible (or any other tool registered reversible=False) to reversible=True. The next agent invocation of that tool bypasses the approval gate entirely — a one-POST kill switch on the runtime's central safety mechanism. Equivalent threat: a malicious code path registers a capability template that shadows a real one and silently broadens an advisor's grants.

Migration 022 adds two layers. First, POST /api/v1/tools refuses any reversible=False → True upgrade on an existing tool with HTTP 409 and writes a TOOL_REVERSIBILITY_CHANGE_REJECTED audit row. The reverse direction (downgrading to False — adding safety) is allowed. Second, AOS_TOOL_REGISTRY_FROZEN defaults to 1 in enforced mode; any registration attempt after startup returns HTTP 403 and writes a TOOL_REGISTRATION_REFUSED_FROZEN audit row. The capability-template loader carries a matching freeze (AOS_TEMPLATE_PATHS_FROZEN); runtime register_template_path() calls raise TemplateRegistryFrozen post-startup. Both audit events flow through the per-session hash chain so forensic queries can answer "did anyone try to disable the gate?".

The 60-second rescind window (AOS_APPROVAL_RESCIND_WINDOW_S) covers operator-mistake-on-approve, but cannot reverse an irreversible action the agent has already executed — the audit row reflects the operator's intent in that case, not a reversal in the world. Tool-signing or content-hash verification at registration time is a defense-in-depth layer on the roadmap.

§ 05 / RoadmapWhat's coming next

The threat model is a living document. The mitigations below are explicitly out of scope today and on the active roadmap. Each is sized, scoped, and has a target release. Threat scenarios graduate from this list to the in-scope section as mitigations ship.

R–01
Entailment gate for reasoning-redirection injection

An entailment judge now runs on every proposal emission. For high- and critical-risk proposals the check is synchronous: the judge reconstructs the cited evidence and verifies the claim follows from it, refusing emission on entails-false or confidence below an operator-configurable threshold. Lower- risk proposals get the same verdict written as a forensic audit row. The judge is itself an LLM, so the verdict inherits the underlying model's accuracy bound — but the structural pipeline replaces what was previously deferred.

Shipped
R–02
KB ingest scanner

Shipped in two stages. Stage 1: a synchronous regex pre-filter scans every chunk before embed, with operator-configurable warn or reject policy and a backfill scanner for pre-existing corpora. Stage 2 (kb-defense follow-on): an asynchronous LLM classifier per chunk catches prose-form injection the regex cannot judge, plus a per-chunk operator quarantine surface that hides flagged chunks from default search until released.

Shipped
R–03
Multi-key authentication with rotation

Per-caller bearer tokens with independent audit attribution and rotation without downtime. Resolves the residual exposure noted in scenario S–01 for deployments with multiple upstream callers.

Researching
R–04
OIDC and mTLS for production deployments

Federated identity for API callers, mutual-TLS as an alternative bearer-token mechanism. Targets enterprise deployment shapes where bearer tokens alone are insufficient for the security review process.

Researching
R–05
Performance characterization under realistic load

Published p50/p99 latency numbers, soak test results, capacity planning guidance under realistic concurrency. Required for first-customer production deployment at scale.

Researching
R–06
Sandboxed code execution as a native tool

Container-isolated Python execution with no network, ephemeral filesystem, and resource limits. Replaces over-broad bash grants with a tighter primitive. Reversible by construction; no approval gate required.

Items marked deferred are intentional gaps where no good general defense exists today (multi-turn injection drift, adversarial pattern evasion). These remain operator-responsibility until the security research community converges on accepted mitigations.

§ 06 / OperatorOperator responsibilities

Several mitigations require operator action to remain effective. These are not features the runtime enforces — they are conditions the runtime depends on.

  • Run with AOS_AUTH_MODE=enforced in production. The warn mode is for development. The X-AOS-Auth-Warning response header makes misconfiguration visible in logs and reverse-proxy traces.
  • Credential the audit anchor store independently from PostgreSQL. Tamper evidence collapses if both stores are reachable from the same compromise. Anchor credentials should live in a separate secrets path with separate rotation.
  • Treat agent proposals as advisory in critical workflows. The seven-layer evidence stack catches most hallucination cases; it cannot prove absence. Production deployments where errors are costly should keep humans authoritative on outcomes.
  • Vet KB ingest sources. Documents added to the knowledge base are trusted by the agents that retrieve them. Ingest from sources you control, or scan documents externally before embedding.
  • Alert on anchor write failures. The /api/v1/health endpoint surfaces audit_anchors.failures. Any non-zero value indicates the tamper-evidence guarantee is degrading.
  • Configure capability templates restrictively by default. The runtime cannot prevent over-broad grants if the operator authorizes them. Build agent classes with the smallest capability set that completes the task.

§ 07 / ReferencesFile and migration references

Every claim in this document maps to specific files, migrations, or audit event types in the codebase. Operators with source access can verify each property directly.

Reference Property
aos/api/syscalls.py Single policy enforcement boundary; capability check, audit append, approval gate, cost ledger record fire here
aos/security/capability.py JWT capability token mint, verify, delegate (intersection-narrowing)
aos/audit/engine.py Per-session SHA-256 hash chain; anchor write to MinIO/S3
aos/server/auth.py Three-mode bearer-token authentication scaffold
aos/kernel/scheduler.py Worker heartbeat, orphan reaper, state-machine validated transitions
migration 012 Per-session audit chain with chain_seq and pg_advisory_xact_lock
migration 014 State-machine validation function; invalid transitions match zero rows
migration 015 Worker heartbeat columns and orphan-detection index
APP_INJECTION_PATTERN_DETECTED Audit event written when tool output matches injection-shaped patterns
HUMAN_APPROVAL_REQUESTED Audit event written when an agent attempts an irreversible tool
APP_AGENT_ABANDONED_UNGROUNDED Terminal state when an agent refuses to investigate before proposing
/api/v1/health Returns auth_mode, audit_anchors success/failure counts, scheduler status

§ 08 / LogChangelog

A maintained threat model has a changelog. New mitigations, new threat scenarios, and reframings are recorded here.

Document revisions
  • May 2026
    v0.7.1 — clarified that most governance properties apply to all actor types, not just reasoning agents. Prompt-injection defenses explicitly marked as reasoning-only. Mitigation language broadened in §§ 02 and 04 to acknowledge that capability narrowing, approval gating, audit chain, and cost ceilings govern deterministic actors and reasoning agents uniformly.
  • May 2026
    v0.7.0 — added scenario S–02 covering prompt-injection in tool outputs; documented untrusted-content delimiter mitigation and APP_INJECTION_PATTERN_DETECTED audit event.
  • Apr 2026
    v0.6.0 — clarified S–06 (audit anchor degradation); added audit_anchors counters to /api/v1/health; documented operator-monitoring expectation.
  • Mar 2026
    v0.5.0 — added scenario S–05 (operator approval mistakes); bound approvals to SHA-256(canonical_json(args)) for argument integrity.
  • Feb 2026
    v0.4.0 — orphan recovery via heartbeat and reaper (migration 015) added; S–04 documented.
  • Jan 2026
    v0.3.0 — per-session audit chains (migration 012); chain serialization moved from global to per-session via pg_advisory_xact_lock.

Got specific questions about your environment?

The earliest engagements include a security-team walkthrough of the runtime against your specific deployment shape. If something in this document doesn't address your threat model, that's the conversation to have.