The Autonomy Ladder — The Operator Stack

The Promise

Stop arguing with your AI tool all day about what it's allowed to do — write the contract once, pick the rung, and let the work run at the speed it's actually safe to run at.

The One-Sentence Setup

Most operators think autonomy is a setting they pick inside a tool — it is actually a contract they never wrote, which is why they live at the safest rung forever and call their AI “slow.”

The Core Insight

Autonomy is not a slider in a settings panel. It is a contract that names three things: what the agent does without asking, what the agent must ask about, and what the agent never does regardless of permission. Most operators feel like AI is “a slow second pair of eyes” because they sit at the ask-before-anything rung by default — not because their work belongs there, but because no one ever taught them how to write the contract that lets the agent climb. The Ladder names five distinct rungs and gives the operator a way to pick the right one for each workstream.

The Mechanism

1. Level 1 — Read-only

What: The agent reads anything in scope and writes nothing. How: Use this for the first 48 hours of a new tool or MCP connector — let the agent see the data, summarize it, propose moves, but block every write. In Claude Code this is the default permission mode before an allowlist is configured. The agent earns the right to write by first proving it can read correctly. Miss this and: the agent acts on a system before the operator has verified it's pointed at the right data, and the first mistake is a real one.

2. Level 2 — Draft + ask

What: The agent drafts everything; the operator reviews each artifact before it ships or executes. How: This is the default state of 90% of operators and the right state when the work is novel enough that you don't yet trust the agent in the domain. Agent produces, human approves, human sends. The cost is named — the operator is the bottleneck, the agent is a writing assistant, the leverage is mostly keystrokes. Miss this and: you never graduate. Level 2 is a starting rung, not a destination.

3. Level 3 — Act on reversibles, ask on irreversibles

What: The agent executes anything reversible without asking. It asks before anything irreversible. How: Define reversibility by the five-minute rule — if you can undo the action in under five minutes without phone calls or apologies, the agent acts. Local file edits, calendar tentatives, internal docs, draft branches, scratch databases — reversible. Sends, payments, deletions, public posts, third-party messages — irreversible. The contract names both lists explicitly. Most operators belong here for most work. Miss this and: the agent either freezes on every action (reversibility defined too narrowly) or fires off an email to a client (defined too loosely).

4. Level 4 — Act + log + guardrails

What: The agent runs autonomously inside a written guardrail list and posts a daily debrief. How: Use this for well-scoped missions running under a Coach (see Framework 07). Guardrails floor: spending money, outbound in the operator's name, private-info firewall, destructive actions. Everything else runs at full speed. The agent writes a single debrief file the operator reads once a day. Claude Code with --dangerously-skip-permissions plus a tight CLAUDE.md is the canonical setup. Miss this and: the agent either over-escalates (every decision becomes a Guardrail) or under-escalates (the list was too short and now there's a real mess).

5. Level 5 — Full autonomy with kill-switch

What: The agent runs without supervision except a scheduled check-in and an emergency stop. How: Only for narrow, well-understood loops where the cost-of-mistake is genuinely low — log triage, inbox sorting, ticket routing, a self-contained data pipeline. The agent has a kill-switch (a file or env flag it checks each cycle) and a daily one-line status. Constitutional classifiers plus the kill-switch are the only safeties. Right for boring repeating loops. Wrong for anything novel. Miss this and: you grant Level 5 to a brand-new pattern, the agent does the wrong thing 200 times before you notice, and the cleanup costs more than the work would have at Level 3.

6. Score the workstream, write the contract, schedule the audit

What: Three quick scores pick the rung. A one-page contract locks it in. A 30-day audit decides whether to climb. How: Score on (a) reversibility — undoable in under five minutes? Yes = Level 3+, no = Level 2 or lower. (b) Blast radius — self-only or external in your name? Self = Level 4 candidate; external = Level 3 with first-send approval. (c) Experience reps — under 10 = Level 2 minimum, over 30 clean = step up a rung. Then write the contract using the OVERNIGHT_AUTONOMY_RULES.md template — Decision-making posture, Hard rules, Workspace scope, Handoff format, Quality bar. Name the level on line one. File at vault/autonomy/<workstream>.md. Put a recurring calendar event at the 30-day mark titled Autonomy audit — has the agent earned a step up? Miss this and: the contract lives in your head, the agent guesses, you argue, and the Ladder collapses into the ask-before-anything default.

The Pitfalls

The Default-to-Paranoid Trap. Operator sits at Level 2 forever because raising autonomy feels scary. Fix: name one workstream this week, score it, and step up one rung. The cost of staying at Level 2 is invisible — it shows up as “AI didn't really change anything” six months later.

The Default-to-Cowboy Trap. Operator grants Level 5 to a brand-new pattern because the tool can run that way. Fix: every new pattern starts at Level 2 for two weeks minimum, regardless of how confident the operator feels.

The Unwritten-Contract Trap. “You know what I mean” is not a contract. Fix: if it isn't in a file the agent can read on every invocation, it doesn't exist.

The Blanket-Rule Trap. One autonomy level across every workstream. Fix: different workstreams sit at different rungs. Inbox triage at Level 4 and outbound sales at Level 3 is a normal portfolio.

The No-Audit Trap. Setting a rung once and never reviewing. Fix: the 30-day calendar event is mandatory. The whole point of a ladder is that you climb it.

The Drill (this week)

Pick one workstream you currently run with AI — inbox, calendar, content drafts, code edits, research, outreach, anything. Score it on reversibility, blast radius, and experience reps. Decide the current rung honestly. Open a new file at vault/autonomy/<workstream>.md. Steal the structure of OVERNIGHT_AUTONOMY_RULES.md: Decision-making posture, Hard rules, Workspace scope, Prohibited actions, Handoff format, Quality bar. Name the current level on line one. Name the level you'd step up to in 30 days if the agent stays clean. Put the audit event on the calendar. Total time: 25 minutes. The contract exists now — the next time the agent asks a question it shouldn't have to ask, you have a document to point at.

The Tools

Layer	Primary	Alternates
Level 1-2 enforcement	Claude Code default permission mode (explicit allowlist)	Claude Project with no tools enabled
Level 3 enforcement	Claude Code `permissions` block in `settings.json` — allow reversible, deny irreversible	Cursor agent mode with file-scope guardrails
Level 4 enforcement	Claude Code `--dangerously-skip-permissions` + tight `CLAUDE.md` charter	Anthropic Claude Agent SDK with custom tool allowlist
Level 5 enforcement	Headless Claude Code run on cron + kill-switch file + constitutional classifiers	Scheduled Agent SDK job with daily one-line status email
Contract template	`OVERNIGHT_AUTONOMY_RULES.md` structure (Posture / Rules / Scope / Handoff / Quality)	Notion page with the same five sections
30-day audit	Recurring calendar event titled Autonomy audit — <workstream>	Obsidian dataview query over `vault/autonomy/` mtimes

The opinion: every operator running AI in production needs a vault/autonomy/ folder. One file per workstream, current rung on line one, audit date on line two. The folder is the operating system for the fleet.

Cross-references

This framework lives in the Act surface of The 4-Surface AI Stack — the Ladder is how the operator decides what the Act surface is allowed to do without asking. The Persona Cascade's Constraints layer is where the Ladder rules get enforced in-prompt — the rung is named, the irreversibles are listed, the agent reads them every invocation. The Orchestrator + Sub-Agent Pattern's Guardrails are the Level 4 contract — Framework 07 and Framework 09 are the same document viewed from two angles. Forward, the Ladder feeds The Hand-off Brief Pattern (Framework 10): every brief names the autonomy rung the work runs at, so the agent receiving the brief knows whether to ask, act, or just log.

One framework. One drill. One week at a time.

The Operator Stack is the architecture. Verala is the practice that runs it on your own communication delivery — voice, pitch, pause, presence. One foundation per week, until it's automatic.

Take the free 5-Foundation Voice Audit → · Book a 30-min intro call →