§ 01 / SummaryWhat this document is
This document lists every security property AOSIQ enforces, with references to the files, migrations, and audit events that implement each property. It also lists, with equal specificity, the threats AOSIQ does not mitigate.
The honest framing matters. A threat model that claims comprehensive coverage is either overpromising or has stopped thinking about new attack vectors. Real security writing acknowledges what it doesn't cover, why, and what compensating control the operator is expected to provide.
Read alongside the file references table at the bottom — every claim made here can be verified against the codebase. Nothing in this document is aspirational. Items still in development are not included; they appear in the public roadmap separately.
§ 02 / CoverageWhat's in scope
Six classes of threat the runtime materially reduces. Each is implemented as a runtime invariant — not as an operator best practice, not as documentation, not as a configuration default.
Most properties apply to all actor types. Capability narrowing, the audit chain, the approval gate, cost ceilings, and backend constraints govern reasoning agents and deterministic actors uniformly. Prompt-injection defenses are reasoning-specific by definition — deterministic actors don't reason and therefore can't be prompt-injected. Where a property is reasoning-only, the bullet below says so.
-
Capability narrowing prevents actor-to-actor privilege
escalation. A child actor's grants are the
intersection of the parent's grants and the child's
request. Empty intersections raise
InsufficientAuthorityat spawn time, before any code runs. - Per-session audit chain detects PostgreSQL-write tampering when the audit anchor store is independently credentialed. Mid-chain row modifications break the SHA-256 linkage and fail verification.
-
Mandatory approval prevents autonomous destructive
actions when tools are registered with
reversible=False. The runtime cannot execute the tool; it inserts a pending review row and blocks the actor until an operator approves the specific(tool, args_hash)pair. - Cost ceilings prevent runaway spend. For reasoning agents this means LLM token cost; for deterministic actors it means compute and external API consumption. Either way, configured per-session ceilings raise an exception before the operation would push usage over budget — not after the bill arrives.
-
Per-class backend constraints prevent unintended
data egress. Applies to reasoning agents.
An agent class registered with
allowed_backends={"ollama"}cannot route to Anthropic, OpenAI, or any other provider regardless of operator misconfiguration. Deterministic actors have no LLM backend so this property doesn't apply to them. - Untrusted-content delimiters and pattern detection raise the floor against direct prompt injection. Applies to reasoning agents only. Tool results are wrapped with structural markers and scanned for injection-shaped patterns; detections prepend warnings and write audit events without breaking legitimate use.
- Continuous red-team probing with a published, honest defense rate. A versioned corpus of adversarial probes — jailbreaks, indirect injection, multi-turn smuggling, emergent tool misuse, and capability/role boundary violations — runs against the runtime's real structural defenses on every change and nightly. The deterministic boundaries (capability, role, approval) defend 100% of their probes; the closed-set injection detector and the open-set categories are reported at their actual rate, not rounded up. We claim "we defend against the patterns in our corpus, with continuous expansion" — never "we defend against prompt injection" in the abstract. Novel jailbreaks that target the model's own refusal remain out of scope by design (below).
§ 03 / LimitsWhat's explicitly out of scope
Six classes of threat AOSIQ does not mitigate today. Each is documented because operators evaluating the runtime need to understand what compensating control they remain responsible for.
- Fully-compromised infrastructure operator. An attacker holding both PostgreSQL write access and audit-anchor store credentials can rewrite the chain and the anchors together. Mitigation: route audit anchors to an external append-only service (object lock, immutable storage tier).
- Reasoning-redirection injection. Adversarial content designed to make the LLM reach a specific conclusion — rather than call a specific tool — produces structurally-valid proposals with proper citations that just arrive at the wrong answer. Two structural surfaces now ship: an asynchronous LLM classifier at ingest time flags prose-form injection the regex layer cannot judge, and an entailment judge on every proposal verifies that the cited evidence actually supports the claim before the proposal is recorded. For high- and critical-risk proposals the judge is a synchronous gate that refuses emission on entails-false. The judge model is itself an LLM, so the verdict inherits the underlying model's accuracy bound — but the structural pipeline is in place rather than deferred.
-
KB content poisoning at ingest time. Closed in
two stages. The first sprint added a synchronous regex
pre-filter on every chunk that
DocumentStore.add_documentscommits, with operator-configurable warn or reject policy, audit events per flagged chunk, and an inline warning at search time when a flagged chunk is returned. The follow-on kb-defense sprint added an asynchronous LLM classifier that stamps a supplementary verdict onto the same metadata field for prose-form injection the regex layer cannot reach, and a per-chunk operator quarantine surface (bearer-authed API endpoint plus per-chunk-CSRF dashboard button) that hides flagged chunks from default search until released. - Multi-turn injection drifting across many tool results. An adversary placing content across a long sequence of tool calls, gradually steering the agent's reasoning, evades single-result pattern detection. No good general defense exists; explicit deferral.
- Adversarially-crafted patterns that evade the regex detectors. The injection pattern set is best-effort, not exhaustive. A determined adversary can construct phrasings that look benign to the detector and still land as injection in the LLM's context.
- LLM hallucination of correct-looking-but-wrong tool arguments where capabilities permit. The seven-layer evidence stack catches most cases but cannot prove absence. Operators remain responsible for treating proposals as advisory, not authoritative, in scenarios where errors are costly.
§ 04 / ScenariosThreat scenarios with mitigations
Each scenario describes a specific attack class the runtime encounters in production deployments. The mitigation is the concrete code path or audit event that addresses the threat; the residual is what an operator should still watch for.
An attacker with network access to the AOSIQ API attempts to
POST /api/v1/agents with capability grants
they shouldn't possess, then directs the resulting agent
to perform actions in the operator's environment.
Bearer-token authentication on every API route via
aos/server/auth.py. Three modes (enforced,
warn, disabled); production
deployments require AOS_AUTH_MODE=enforced.
The warn mode adds an explicit
X-AOS-Auth-Warning response header so
misconfiguration surfaces in logs. Capability grants
requested at spawn are clamped to the caller's authorized
token claims; a caller cannot mint a token broader than
its own.
The single bearer-token model does not provide per-caller audit attribution. Multi-key auth with rotation is on the roadmap. Until then, deployments with multiple upstream callers should provision per-deployment instances rather than sharing a single token.
An MCP server, file-system content, or network response returns content designed to make the LLM emit an action the operator did not authorize. Applies to reasoning agents only. Deterministic actors cannot be redirected by tool-result content because they don't reason about it — they execute coded logic.
Three layers compose. Capability narrowing
limits the action surface: an actor without
delete_records in its capability token cannot
call it regardless of what the LLM is talked into.
Approval gating on irreversible tools
ensures destructive actions require explicit operator
sign-off at the moment of execution. These two protections
apply to all actors as defense in depth against many
attack classes, not just this one. The
approval policy that drives this gating can
itself be externalized to a signed, schema-validated
artifact: changing it then requires a cryptographic signer
key rather than a code deploy, and every change leaves a
tamper-evident audit record — policy as a pre-committed,
independently-auditable rule rather than a buried constant.
Untrusted-content delimiters
(<untrusted_tool_output> tags) plus
pattern detection raise warnings and write
APP_INJECTION_PATTERN_DETECTED audit events
when tool output looks injection-shaped — reasoning-agents
only.
Reasoning redirection (steering the LLM to a wrong conclusion within its granted capability set) is not detected. Critical workflows should treat agent proposals as advisory and route final decisions through human review.
An agent waiting hours on human approval for an irreversible action holds a JWT capability token that expires before approval arrives. The agent fails at resume time and the workflow stalls.
Token TTL is configurable via
AOS_TOKEN_TTL_SECONDS (default 3600).
Long-approval-cycle agent classes should be deployed
with longer TTLs. The scheduler's heartbeat-and-reaper
flow detects expired-token agents and surfaces them as
recoverable rather than terminal.
No automatic token refresh on dispatch. Agents whose TTL expires mid-execution require operator intervention to re-mint. Refresh-on-dispatch is a roadmap item.
running state
Limited
A scheduler worker crashes after claiming an agent but
before completing its step. Without cleanup, the agent's
row in agent_processes stays in
running indefinitely and is not re-queued.
Worker heartbeat (migration 015) records
last_heartbeat on every agent claim. An
orphan reaper loop detects agents whose worker has gone
silent past a threshold, transitions them back to
runnable, and re-queues them. Combined with
the three-part composite checkpoint, the agent resumes
cleanly from its last successful step. Tool-call
idempotency on the agent's tool implementations is the
expected compensating control.
At-least-once tool execution is possible if the worker crashes after a tool call started but before the result was recorded. Agent tool implementations are expected to be idempotent; this is documented as an integration requirement.
The approval gate is only as strong as the operator reviewing the prompt. An operator approving rapidly, under pressure, or without reading the proposed arguments can authorize a wrong or harmful action.
The dashboard renders proposed action, full arguments,
blast radius classification, evidence trail, and
capability scope at the moment of decision. Approvals
are bound to the specific
SHA-256(canonical_json(args)); modifying
the args after approval invalidates the approval.
Audit chain captures the approving operator's token
jti for forensic attribution.
Approval fatigue is an operational risk that grows with
volume. Operators should configure capability templates
tightly enough that high-volume actions are
reversible=true and only genuinely
destructive actions reach the queue.
MinIO or the configured S3-compatible anchor store becomes unreachable. Without the operator noticing, the tamper-evidence guarantee silently degrades — new anchors stop being written, and an attacker with PG write access has a longer window to rewrite chain rows before detection.
The runtime tracks anchor write success and failure
counters at the process level; both are surfaced in
/api/v1/health as
audit_anchors. Operators are expected to
alert on any failure count above zero. Anchor write
attempts are non-blocking — chain integrity continues
even if anchoring fails — but the gap is visible.
Operators who do not alert on
audit_anchors.failures can run for arbitrary
periods with degraded tamper evidence. The metric is
available; operational discipline to monitor it is the
compensating control.
Operator-facing surfaces around the approval gate are
attacked in six distinct ways: the dashboard router
accepts unauthenticated form posts in
enforced mode (asymmetric coverage with the
API router); two operators race to approve the same
review and both succeed, racing one of their notes out
of the record; the operator approves without a written
note so post-incident review has no context; the
capability token underlying the review has already
expired so the approval can never take effect; a
malicious cross-origin form post mints an approval the
operator did not actually click. Each axis on its own is
a documented operator-trust hole.
Migration 022 (approval-gate-hardening sprint) closes
the cluster: dashboard router carries
Depends(verify_api_key) symmetric with the
API router. The shared
aos/kernel/approvals.py primitive uses
SELECT … FOR UPDATE OF r and an UPDATE
that re-checks status='pending', so two
operators racing the path cannot both succeed. The
reviewer-note textarea is required at both the HTML
form layer (required minlength=1) and the
application primitive (ReviewerNoteRequired
→ HTTP 422). Capability token TTL is checked at
approve-time and surfaced in a server-rendered
+ JS-refreshed countdown with amber/red thresholds.
Dashboard form posts validate a per-form CSRF token
bound to (agent_id, operator_id, target_path,
expiry) — tokens minted for
/approve cannot be replayed against
/reject or /rescind.
AOS_AUTH_MODE default flipped from
warn to enforced; warn mode
refuses to start on a non-localhost bind unless
AOS_AUTH_MODE_ALLOW_REMOTE_WARN=1.
Per-operator capability attribution still rides on a shared bearer token — multiple operators are indistinguishable in the audit record. Multi-key auth with per-identity capabilities is on the roadmap; deployments needing per-operator audit today provision per-operator instances rather than sharing a single token.
An attacker (or careless operator) with an API key
calls POST /api/v1/tools to flip
spawn_agent.reversible (or any other tool
registered reversible=False) to
reversible=True. The next agent invocation
of that tool bypasses the approval gate entirely — a
one-POST kill switch on the runtime's central safety
mechanism. Equivalent threat: a malicious code path
registers a capability template that shadows a real one
and silently broadens an advisor's grants.
Migration 022 adds two layers. First,
POST /api/v1/tools refuses any
reversible=False → True upgrade on an
existing tool with HTTP 409 and writes a
TOOL_REVERSIBILITY_CHANGE_REJECTED audit
row. The reverse direction (downgrading to
False — adding safety) is allowed. Second,
AOS_TOOL_REGISTRY_FROZEN defaults to 1
in enforced mode; any registration attempt
after startup returns HTTP 403 and writes a
TOOL_REGISTRATION_REFUSED_FROZEN audit
row. The capability-template loader carries a matching
freeze (AOS_TEMPLATE_PATHS_FROZEN);
runtime register_template_path() calls
raise TemplateRegistryFrozen post-startup.
Both audit events flow through the per-session hash
chain so forensic queries can answer "did anyone try to
disable the gate?".
The 60-second rescind window
(AOS_APPROVAL_RESCIND_WINDOW_S) covers
operator-mistake-on-approve, but cannot reverse an
irreversible action the agent has already executed —
the audit row reflects the operator's intent in that
case, not a reversal in the world. Tool-signing or
content-hash verification at registration time is a
defense-in-depth layer on the roadmap.
§ 05 / RoadmapWhat's coming next
The threat model is a living document. The mitigations below are explicitly out of scope today and on the active roadmap. Each is sized, scoped, and has a target release. Threat scenarios graduate from this list to the in-scope section as mitigations ship.
An entailment judge now runs on every proposal emission. For high- and critical-risk proposals the check is synchronous: the judge reconstructs the cited evidence and verifies the claim follows from it, refusing emission on entails-false or confidence below an operator-configurable threshold. Lower- risk proposals get the same verdict written as a forensic audit row. The judge is itself an LLM, so the verdict inherits the underlying model's accuracy bound — but the structural pipeline replaces what was previously deferred.
Shipped in two stages. Stage 1: a synchronous regex pre-filter scans every chunk before embed, with operator-configurable warn or reject policy and a backfill scanner for pre-existing corpora. Stage 2 (kb-defense follow-on): an asynchronous LLM classifier per chunk catches prose-form injection the regex cannot judge, plus a per-chunk operator quarantine surface that hides flagged chunks from default search until released.
Per-caller bearer tokens with independent audit attribution and rotation without downtime. Resolves the residual exposure noted in scenario S–01 for deployments with multiple upstream callers.
Federated identity for API callers, mutual-TLS as an alternative bearer-token mechanism. Targets enterprise deployment shapes where bearer tokens alone are insufficient for the security review process.
Published p50/p99 latency numbers, soak test results, capacity planning guidance under realistic concurrency. Required for first-customer production deployment at scale.
Container-isolated Python execution with no network, ephemeral filesystem, and resource limits. Replaces over-broad bash grants with a tighter primitive. Reversible by construction; no approval gate required.
Items marked deferred are intentional gaps where no good general defense exists today (multi-turn injection drift, adversarial pattern evasion). These remain operator-responsibility until the security research community converges on accepted mitigations.
§ 06 / OperatorOperator responsibilities
Several mitigations require operator action to remain effective. These are not features the runtime enforces — they are conditions the runtime depends on.
-
Run with
AOS_AUTH_MODE=enforcedin production. Thewarnmode is for development. TheX-AOS-Auth-Warningresponse header makes misconfiguration visible in logs and reverse-proxy traces. - Credential the audit anchor store independently from PostgreSQL. Tamper evidence collapses if both stores are reachable from the same compromise. Anchor credentials should live in a separate secrets path with separate rotation.
- Treat agent proposals as advisory in critical workflows. The seven-layer evidence stack catches most hallucination cases; it cannot prove absence. Production deployments where errors are costly should keep humans authoritative on outcomes.
- Vet KB ingest sources. Documents added to the knowledge base are trusted by the agents that retrieve them. Ingest from sources you control, or scan documents externally before embedding.
-
Alert on anchor write failures. The
/api/v1/healthendpoint surfacesaudit_anchors.failures. Any non-zero value indicates the tamper-evidence guarantee is degrading. - Configure capability templates restrictively by default. The runtime cannot prevent over-broad grants if the operator authorizes them. Build agent classes with the smallest capability set that completes the task.
§ 07 / ReferencesFile and migration references
Every claim in this document maps to specific files, migrations, or audit event types in the codebase. Operators with source access can verify each property directly.
| Reference | Property |
|---|---|
| aos/api/syscalls.py | Single policy enforcement boundary; capability check, audit append, approval gate, cost ledger record fire here |
| aos/security/capability.py | JWT capability token mint, verify, delegate (intersection-narrowing) |
| aos/audit/engine.py | Per-session SHA-256 hash chain; anchor write to MinIO/S3 |
| aos/server/auth.py | Three-mode bearer-token authentication scaffold |
| aos/kernel/scheduler.py | Worker heartbeat, orphan reaper, state-machine validated transitions |
| migration 012 | Per-session audit chain with chain_seq and pg_advisory_xact_lock |
| migration 014 | State-machine validation function; invalid transitions match zero rows |
| migration 015 | Worker heartbeat columns and orphan-detection index |
| APP_INJECTION_PATTERN_DETECTED | Audit event written when tool output matches injection-shaped patterns |
| HUMAN_APPROVAL_REQUESTED | Audit event written when an agent attempts an irreversible tool |
| APP_AGENT_ABANDONED_UNGROUNDED | Terminal state when an agent refuses to investigate before proposing |
| /api/v1/health | Returns auth_mode, audit_anchors success/failure counts, scheduler status |
§ 08 / LogChangelog
A maintained threat model has a changelog. New mitigations, new threat scenarios, and reframings are recorded here.
-
May 2026v0.7.1 — clarified that most governance properties apply to all actor types, not just reasoning agents. Prompt-injection defenses explicitly marked as reasoning-only. Mitigation language broadened in §§ 02 and 04 to acknowledge that capability narrowing, approval gating, audit chain, and cost ceilings govern deterministic actors and reasoning agents uniformly.
-
May 2026v0.7.0 — added scenario S–02 covering prompt-injection in tool outputs; documented untrusted-content delimiter mitigation and
APP_INJECTION_PATTERN_DETECTEDaudit event. -
Apr 2026v0.6.0 — clarified S–06 (audit anchor degradation); added
audit_anchorscounters to/api/v1/health; documented operator-monitoring expectation. -
Mar 2026v0.5.0 — added scenario S–05 (operator approval mistakes); bound approvals to
SHA-256(canonical_json(args))for argument integrity. -
Feb 2026v0.4.0 — orphan recovery via heartbeat and reaper (migration 015) added; S–04 documented.
-
Jan 2026v0.3.0 — per-session audit chains (migration 012); chain serialization moved from global to per-session via
pg_advisory_xact_lock.
Got specific questions about your environment?
The earliest engagements include a security-team walkthrough of the runtime against your specific deployment shape. If something in this document doesn't address your threat model, that's the conversation to have.