The Promise
Design AI personas that hold their shape under pressure — so the model behaves the same way on day 90 as it did on day one, even when conflicting inputs hit it.
The One-Sentence Setup
Most operators write personas as wish lists ("be friendly, be smart, be concise"); real personas are a priority-ordered cascade where each layer overrides the one below it when they conflict.
The Core Insight
A persona is not personality. A persona is a conflict-resolution system. The moment an AI gets two inputs that pull in opposite directions — the user wants a long answer, the system says be brief; the data says one thing, the user's instinct says another — the model has to choose. If the persona is a flat list of adjectives, the model picks at random and you get drift. If the persona is a cascade, the model has a hard-coded priority order to fall back on. The cascade is what turns "vibes" into a deterministic operating instrument.
The Mechanism — The Six Layers
The Cascade reads top-down. Higher layers always win.
Layer 1 — The Identity
What: One paragraph naming who the AI is, its role, and its relationship to the operator. How: Role + tone keyword + relationship in three sentences max. Example: "You are the operator's chief of staff. You are crisp, candid, and protective of their time. You speak to them, not at them." Miss this and: the model substitutes its training-default identity, which is sycophantic, hedging, and verbose.
Layer 2 — The Constraints
What: The hard rules the AI never violates regardless of input. How: Frame as "never" and "always." Length caps, format limits, off-limits topics, never-fabricate categories. Hormozi's "what to NEVER do" frame — make the negatives explicit before the positives. Miss this and: the friendly Identity layer overrides the guardrails the first time a user pushes hard, and the persona collapses into pleasing the user.
Layer 3 — The Context Priority Order
What: A numbered list that resolves which input wins when sources disagree. How: Rank inputs top-down. Example for an operator AI: 1) most recent explicit instruction in this turn, 2) the operator's standing rules, 3) current business context from memory, 4) calendar + live data, 5) general knowledge. Then the cascade has an answer for every conflict. Miss this and: the AI defaults to "most recent token wins," which is why prompts mid-conversation override critical system rules.
Layer 4 — The Output Format
What: How responses are structured on the page. How: Length cap (in words, not "concise"), bullet vs. prose policy, markdown vs. plain, emoji policy, code-block rules, where structured output goes. Specify the negative too: "never end with 'Let me know if...'" Miss this and: the model regresses to its training-default format — long, padded, headered-up — and the operator's actual readable signal drops.
Layer 5 — The Emotional Register
What: The tone living underneath the words. How: Pick two or three adjectives and define each by what they exclude. "Warm" means "not cold" — but does it mean "casual" or "professional-warm"? Name it. "Direct" means "no hedging" — but does it mean "blunt" or "respectful-direct"? Name it. Miss this and: the AI averages toward enthusiastic-corporate-helpful, which is the default tone the operator wanted to escape by writing a persona in the first place.
Layer 6 — The Capability + Failure Mode
What: What the AI may invent vs. what it must refuse to guess. How: Two lists. Allowed-to-fabricate: brainstorms, drafts, hypotheticals. Must-say-I-don't-know: dates, numbers, internal facts, anything from systems the AI isn't connected to. Name the systems the AI does NOT have access to explicitly. Miss this and: the model hallucinates confidently in the exact zones where the operator most needs precision.
The Five-Step Design Process
- Name the constraint first. Before you write a friendly Identity, write the single thing the AI is most likely to mess up. That becomes Layer 2.
- Write Layer 2 before Layer 1. Operators reflexively start with personality. Start with the guardrails — they're harder, and they shape everything above them.
- Stress-test the Cascade with three conflict scenarios. Pick three cases where layers will fight (the user asks for something the constraints forbid; live data contradicts memory; the format rule conflicts with the user's request). If the priority order doesn't resolve them, add a layer or sharpen one.
- Cache the persona. Use Anthropic's prompt caching (
cache_control: ephemeral) on the persona block. You'll re-send the same persona on every call — caching cuts roughly 90% of the input-token cost on repeats. - Iterate weekly on the lowest-priority layer first. Constraints and Context Priority should stabilize early and stay stable. Tone and format are where you tune. Don't touch Identity until the bottom of the stack is solid.
The Pitfalls
- The wish-list persona. A flat paragraph of adjectives with no priority order. Fix: rewrite as six layers, ranked.
- Tone above Constraints. Putting "be helpful" above "never fabricate numbers." Fix: Constraints always sit at Layer 2, no exceptions.
- Silent access assumptions. Forgetting to name what the AI cannot see (calendar it isn't connected to, files it can't read). Fix: Layer 6 must include an explicit "you do NOT have access to X" list.
- The 3,000-word persona. Long personas dilute the priority order — the model loses the cascade in the noise. Fix: keep the persona under 1,200 words. Compression forces clarity.
- No adversarial testing. Shipping a persona without running three deliberate conflict prompts against it. Fix: build the test prompts into the persona file itself as a comment block, so you re-run them every time you edit.
The Drill — This Week (30 minutes)
Take whatever AI persona you're running today — a Custom GPT, a Claude Project system prompt, a Cursor rule, your CLAUDE.md. Restructure it into the six layers, in order, top-down. Now write one conflict scenario where two layers fight. Run the persona against the prompt. If the AI doesn't resolve the conflict the way your Cascade says it should, sharpen the priority order until it does. Then wrap the whole persona block in a cache_control: ephemeral marker so you stop paying full input-token cost on every call.
The Tools
| Tool | Use |
|---|---|
Anthropic prompt caching (cache_control: ephemeral) | Cache the persona block; ~90% input-token savings on repeats |
| Claude Projects (desktop) | Persistent system prompt + attached context — best for solo operator stack |
| Custom GPTs | OpenAI's equivalent; weaker on priority-order adherence, stronger on tool wiring |
Claude Code CLAUDE.md | Repo-level persona for engineering work — version-controlled with the code |
Cursor .cursor/rules | Editor-level persona for code completion + chat |
Cross-references
- Plugs into The 4-Surface AI Stack. The Persona lives on the Reason surface — it's the configuration layer on top of the model.
- Plugs into The Memory Architecture. The Cascade's Layer 3 (Context Priority Order) is where memory files get ranked against live input.
- Forward to The Autonomy Ladder. The higher the autonomy level you grant the AI, the stronger Layer 2 (Constraints) has to be — autonomy without constraints is how operators get burned.
The Coaching Cross-link
One framework. One drill. One week at a time.
The Operator Stack is the architecture. Verala is the practice that runs it on your own communication delivery — voice, pitch, pause, presence. One foundation per week, until it's automatic.
Take the free 5-Foundation Voice Audit → · Book a 30-min intro call →