The Context Pyramid: A PM’s Framework for AI Agent Context Engineering
Why Your AI Agent Breaks: A Diagnostic Framework for Product Managers
TL;DR Context engineering is the skill that determines whether your AI agent is useful or a liability. Not model choice. Not prompt phrasing. Context. Every agent has four distinct context layers: Identity, Knowledge, State, and Task. Each breaks in a different way, for a different reason, on a different schedule. The Cadence Model (the PwA contribution here) tells you who touches each layer and when. Rarely. Occasionally. Per turn. Per task. The Diagnostic Loop maps agent failure symptoms to the exact layer that’s broken. No more guessing.
The Context Window Is Not Your Problem
Many users think that context problems are about size.
Bigger window, better agent. Use the 1M token model, problem solved.
Wrong.
The problem is not how much fits, it’s what we put in, in what order, with what structure, and how often we update each part.
Andrej Karpathy explained it like this in June 2025:
Context engineering is the delicate art and science of filling the context window with just the right information for the next step.
Not all information. Not the most information. The right information.
Most of the writing on this topic was produced by engineers for engineers. System design diagrams. RAG architecture tutorials. Vector DB benchmarks. Useful. But none of it is written from the PM seat, where the questions sound different.
Why does my agent suddenly act like it forgot the product strategy we discussed three sessions ago?
Why does it keep generating outputs we already rejected?
Why does a fresh agent outperform the one that’s been running for a week?
These are PM questions. And they require a PM-shaped answer.
That is what the Context Pyramid gives you.
Hey, I’m Karo Zieminski 🤗
AI Product Manager and builder.
I write Product with Attitude, an AI newsletter of 17,000+ subscribers building with AI and developing critical AI literacy through practice.
The kind where you sit down on a Saturday morning, follow a guide,
and walk away with a working agent, automation, or product.
Built by you. Understood by you. Owned by you.
If you’re new here, welcome! Here’s what you might have missed:
→ An Illustrated Guide to Context Engineering
→ The Only AI Prompting Guide That Works On Reasoning Models (And Our Cognition)
The Context Pyramid
The Context Pyramid is a four-layer model for thinking about the information that shapes agent behavior.
Each layer has a distinct purpose, a distinct home, and a distinct update cadence. The layers are: Identity, Knowledge, State, and Task.
They are ordered from the most stable (bottom) to the most volatile (top). The pyramid shape is intentional. Identity is the foundation. Task is the tip. Flip any layer and the whole structure wobbles.
If you’ve been rebuilding the Task layer five times in a day, wondering why the agent keeps going off-rails, the actual problem is more likely a broken Identity layer that was set up six weeks ago and never revisited.
Let us go through each layer.
Identity Layer: Who Is This Agent?
Definition: Role, persona, and guardrails that define the agent’s character and constraints.
Where it lives: System prompt, project instructions or agent configuration.
Update cadence: Rarely. Think quarterly at most, or when the agent’s fundamental purpose changes.
The Identity layer is the agent’s constitution. It establishes who the agent is, what it cares about, and what it will never do. It is not a list of instructions for the current task. It is the pre-existing condition everything else runs inside.
A solid Identity layer defines: the agent’s role (e.g., “You are a B2B product analyst focused on enterprise SaaS metrics”), the communication style, hard constraints (”Never speculate about competitor financials”), and escalation behavior.
Many Identity layers are written once, at project kickoff, and then forgotten. That is fine. The update cadence is supposed to be rare. The problem is when Identity never gets reviewed, even as the product, the user base, or the agent’s scope evolves.
PM diagnostic questions:
If you cleared the system prompt and showed the current Identity layer to a new team member, would they understand exactly what this agent is and is not supposed to do?
Has the agent’s actual scope changed since the Identity layer was written? (New integrations, new user types, expanded tasks?)
Are there behaviors you keep correcting in Task prompts that could be fixed once in Identity instead?
Example failure mode: An agent built to assist with customer onboarding gets handed a support ticket workflow six months in. The Identity layer still says “focus on new users only.” The agent keeps refusing to engage with complex troubleshooting because it reads those requests as out-of-scope. The team patches each instance in the Task layer instead of updating Identity. The patches accumulate. The agent becomes incoherent.
Knowledge Layer: What Does It Know?
Definition: Facts, documents, product context, user preferences, and domain knowledge the agent draws on when reasoning.
Where it lives: Semantic memory, RAG pipelines, external files, knowledge bases.
Update cadence: Occasionally. When underlying facts change, not when tasks change.
The Knowledge layer is the agent’s long-term memory. It holds the things that are mostly stable but do need updating as the world changes: product documentation, pricing rules, customer segment profiles, brand guidelines, competitor data.
This is the layer that benefits from RAG (Retrieval-Augmented Generation) architectures. Instead of jamming everything into the system prompt and burning context on facts the agent does not need right now, you store knowledge externally and retrieve it just-in-time. Anthropic’s engineering team calls this “progressive disclosure”: agents incrementally discover relevant context through exploration rather than loading everything upfront.
The failure mode here is deceptively quiet. Stale knowledge does not throw an error. It produces confident, plausible, wrong outputs.
PM diagnostic questions:
When was the Knowledge layer last updated? Is that still current? (If you shipped a pricing change three months ago and the agent still quotes old pricing, you have a Knowledge layer problem.)
Is the agent retrieving knowledge or hallucinating it? (Test with specific, verifiable facts from your knowledge base and compare agent outputs against the source.)
Is there knowledge that should be in this layer but exists only in someone’s head or a shared drive?
Example failure mode: A sales agent is built with competitive battlecards from Q1. A competitor releases a major new product in Q3. The Knowledge layer is not updated. The agent continues producing outdated competitive positioning, with full confidence, through every customer-facing interaction. No one notices for weeks because the outputs look plausible.
State Layer: What Just Happened?
Definition: Session history, recent outputs, active blockers, and intermediate results from the current work session.
Where it lives: Context window, scratchpad, session memory.
Update cadence: Per turn. The State layer is essentially continuous.
State is the most mismanaged layer in production agents. It is also the one most responsible for the “context rot” phenomenon that makes agents degrade over long sessions.
Here is what context rot looks like in practice: you start a session with a crisp, capable agent. Forty turns in, it is repeating outputs it already generated, missing constraints it acknowledged earlier, and contradicting decisions it made thirty turns ago. The model has not changed. The context window has filled up with noise.
A 2023 Stanford paper by Liu et al., published in TACL, documented exactly why: LLM performance follows a U-shaped curve across input positions. Models attend well to the beginning and end of context. Everything in the middle suffers significant accuracy degradation, up to 30 percentage points lower in multi-document question answering tasks. Your agent does not have amnesia. It has a structural attention problem.
Chroma’s 2025 research confirmed this across 18 frontier models: every single one degrades as input length increases. The models are smart enough to solve the problem. The context is the constraint.
The fix is not a bigger context window. It is active State management: compacting completed reasoning, keeping the most recent relevant outputs surfaced, and offloading resolved threads to external memory.
PM diagnostic questions:
How many turns does your agent run before it starts repeating itself or contradicting earlier decisions?
Do you have any compaction or summarization strategy running, or is the context window just accumulating everything?
Is the agent’s scratchpad (if it has one) being used to hold intermediate conclusions, or is it growing unbounded?
Example failure mode: A research agent is asked to synthesize findings across a 90-minute session. By turn 60, the earliest retrieved sources, which contained the most important context, are now buried in the middle of a 40,000-token context window. The agent begins generating conclusions that contradict its own early findings because it can no longer effectively attend to the information it retrieved first. The session output is internally inconsistent. No error was thrown.
Task Layer: What Is It Doing Right Now?
Definition: Current goal, constraints, success criteria, and output format for this specific task.
Where it lives: Current prompt (user turn or orchestrator instruction).
Update cadence: Per task. This is the highest-velocity layer.
The Task layer is where most PM attention goes, and it is also the layer where most debugging effort is wasted. Rewriting the Task prompt is the PM equivalent of rebooting the router: it solves a lot of things, which makes it the default response to problems that actually live elsewhere.
A well-constructed Task layer does four things: states the goal precisely, specifies constraints (format, length, tone, scope), defines what “done” looks like, and provides the minimal context the agent needs for this particular task (not everything it might ever need).
The last point matters because of context rot. Every token you dump into the Task layer that is not relevant to the current task is a token that pushes important information toward the middle of the window. Anthropic frames this as finding “the smallest possible set of high-signal tokens”. Verbose Task prompts that include historical context, background explanation, and multiple tangential constraints are a context management anti-pattern.
PM diagnostic questions:
Does this Task prompt contain information that belongs in the Identity or Knowledge layer instead? (If yes, remove it from here and put it where it belongs.)
Is the success criterion explicit enough that the agent would know when it is done without being told?
If this Task prompt were handed to a new agent with no prior session context, would it still work?
Example failure mode: A PM writes a 2,000-word Task prompt that includes company background, the full product strategy, the agent’s role definition, and competitive context, plus the actual task instruction buried at the end. The agent produces a mediocre output. The PM rewrites the Task prompt. The output is still mediocre. The problem: the agent’s attention is distributed across 2,000 tokens of low-signal context, and the actual task instruction is sitting in the middle. A 150-word Task prompt with the Identity and Knowledge layers handled separately would outperform it.
The Cadence Model and The Context Layer Ownership
The original PwA contribution here is making the update cadence explicit. Every context engineering writeup treats all context as equivalent. It is not. Each layer has a natural clock. Violating those clocks, either by updating too often or too rarely, creates predictable failures.
Set the Identity layer correctly once. Let the Knowledge layer handle stable facts. Trust the State layer to accumulate session history. Keep the Task layer surgical.
A common anti-pattern: a PM with a vague Identity layer compensates by being hyper-specific in every Task prompt. The Task layer bloats. Context rot accelerates. The PM works harder. The agent gets worse.
The Diagnostic Loop for Product Managers
Your agent is broken. Here is how to find the layer.
Symptom: Agent ignores constraints it was clearly given.
Start at Identity. Is the constraint something that should be permanent (a guardrail)? If yes, it belongs in Identity. If it is in the Task prompt instead, it will be treated as task-specific and may not generalize.
Fix: move it to Identity.
Symptom: Agent produces outdated or factually incorrect information.
Go to Knowledge. When was the Knowledge layer last updated? Is the source it is drawing on current? Does the RAG retrieval surface the right documents for this query type?
Fix: refresh Knowledge layer, audit retrieval relevance.
Symptom: Agent contradicts its own earlier outputs in the same session.
This is State. The context window has grown too large and earlier reasoning is in the middle-blindspot.
Fix: implement compaction, reduce State accumulation, or use structured note-taking to surface key conclusions at the top of the window.
Symptom: Agent does the wrong thing with a clearly correct prompt.
Check State first: is there conflicting session history that overrides the current Task prompt? Then check Identity: is there a guardrail creating an implicit constraint the Task prompt is triggering?
Fix: debug context pollution in State, or revise the relevant Identity guardrail.
Symptom: Agent works perfectly on turn 1, degrades across a long session.
Pure State problem. Context rot in action.
Fix: active State management strategy (compaction, note-taking, sub-agent handoffs).
Symptom: Agent works in testing, fails with real users.
Knowledge gap. Real-user queries surface domain specifics that test cases do not cover.
Fix: expand Knowledge layer with real-world edge cases, preferences, and terminology.
Symptom: Agent is consistent but wrong: the wrong persona, the wrong tone, the wrong scope.
Identity layer was set incorrectly at the start. Not a State or Task problem.
Fix: rewrite Identity layer with explicit role definition, communication constraints, and scope boundaries.
Attribution
The term “context engineering” was coined (or at least authoritatively popularized) by Andrej Karpathy in a June 2025 post on X:
In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.
The empirical backbone for context rot and the lost-in-the-middle problem comes from Liu et al.’s 2023 Stanford paper published in TACL, which documented U-shaped performance curves across context positions, with up to 30-point accuracy drops for information in the middle of long contexts.
The practical implementation strategies for agent memory (compaction, structured note-taking, just-in-time retrieval) come from Anthropic’s engineering blog, specifically “Effective Context Engineering for AI Agents”, published September 2025, authored by the Applied AI team.
The four-layer pyramid structure, the Cadence Model, and the Diagnostic Loop are original PwA frameworks.
FAQ
Q: Is context engineering the same as prompt engineering?
No. Prompt engineering is about crafting the instruction for a single call. Context engineering is about managing what information exists across all layers of an agent’s context, across sessions, for the lifetime of that agent. Prompt engineering is a sub-skill inside context engineering.
Q: Do I need a RAG system to do context engineering?
Not necessarily. RAG is one way to manage the Knowledge layer. A well-maintained CLAUDE.md file, a Notion doc the agent can read, or a few well-curated files work too. RAG matters when your knowledge base is large and queried dynamically. If your agent has a fixed, small knowledge set, a flat file in the system prompt context is fine.
Q: How do I know which layer is broken?
Use the Diagnostic Loop above. The symptom pattern tells you the layer. If the agent is consistent but wrong, that is Identity. If it is factually stale, that is Knowledge. If it degrades over a session, that is State. If it does the wrong thing on a single clear prompt, check Task, then State for contamination.
Q: My agent works fine for a few turns but gets worse over time. What is happening?
Context rot. Your State layer is accumulating noise faster than the agent can process it. The lost-in-the-middle effect means earlier context is effectively disappearing from the agent’s attention even though it is technically present. Active State management (compaction, note-taking) is the fix.
Q: Who should own the Identity layer?
The PM or tech lead, in collaboration. The Identity layer encodes product decisions (what this agent is, what it will not do) and technical constraints (how it should interact with tools, what output format it produces). It is not a developer-only artifact. The PM who does not own the Identity layer will perpetually wonder why the agent feels off-brand.
Q: How often should I update the Knowledge layer?
When the underlying facts change, not on a schedule. Treat it like documentation: if you shipped a feature, updated pricing, changed a policy, or onboarded a new integration, the Knowledge layer needs a corresponding update. A useful heuristic: any change that would require onboarding a new employee to be aware of should also update the Knowledge layer.
Q: Can I build this framework into an existing agent, or do I need to start from scratch?
You can retrofit it. Start by auditing what you have: what is in the system prompt, what is retrieved, what is in session history, and how tasks are structured. Then map each piece to its correct layer. Consolidate Identity into a clean system prompt section. Move stable facts out of Task prompts and into a Knowledge source. Add State management (even basic note-taking works). The audit itself is valuable because it shows you what has been mixed together.
Final Thoughts
The agents running your business in 2027 will require context architecture, not just prompt wording.
The PM skill gap is still real. Engineering has been doing context engineering for a while, because broken context breaks code visibly. Product work surfaces failure more slowly. An agent that generates plausible-but-wrong competitive analysis, or drifts from brand voice across a thousand outputs, fails quietly. Those failures compound.
The Context Pyramid gives you a mental model for owning the full context stack, not just the Task layer you touch every day. Use it when agents break. Use it when building new ones. Use it to make the case for context infrastructure investment.
Context engineering is not a technical specialization. It’s a product skill. You just were not given the framework for it until now.
Product with Attitude is a newsletter for PMs and founders building with AI. If this issue was useful, the paid tier goes deeper: agent architecture walkthroughs, PwA diagnostic templates, and the full context engineering toolkit I use to build and maintain the automations running this newsletter.
WHY SUBSCRIBE ・YOUR BENEFITS・ TOOLS I BUILT・CLAUDE HUB・PERPLEXITY HUB ・VIBE CODING HUB











