
In Part 1, we argued that most dev teams start in the wrong place. They obsess over prompts, when the real problem is structural: agents are dropped into repositories that were never designed for them. The solution was to make the repository itself agent-native through a standardized instruction layer like AGENTS.md.
But even after you fix the environment, something still breaks.
The agent starts strong. It understands the problem, follows instructions, seems super intelligent
Then, somewhere along the way, things begin to drift. The code still compiles, but the logic gets inconsistent. Small mistakes creep in. Constraints are ignored. Assumptions mutate.
Nothing fails loudly. Everything just gets slightly worse.
This is the second failure mode of AI systems: context rot.
Context Rot Is Not a Bug — It’s a Property
There is a persistent assumption in the industry that more context leads to better performance. If a model can handle large context windows, then giving it more information should improve accuracy.
In practice, the opposite is often true.
Recent research from Chroma shows that LLM performance degrades as input length increases, even when the model is operating well within its maximum context window. Similar observations are echoed across independent analyses, including this breakdown of why models deteriorate in longer sessions and practical explorations of how context mismanagement impacts production systems.
This is not an edge case. It is a structural limitation.
Models do not “understand” context in a hierarchical way. They distribute attention across tokens. As context grows, signal competes with noise. Important instructions lose weight. Irrelevant details gain influence. Conflicts accumulate.
What looks like a reasoning failure is often just context degradation.
Why Long Sessions Break Down
If you’ve worked with AI coding agents for more than a few hours, you’ve already seen this pattern.
A session starts with clear instructions and aligned reasoning. Over time, it fills with partial implementations, outdated assumptions, repeated instructions, and exploratory dead-ends. The model doesn’t forget earlier information, it simply cannot prioritize it effectively anymore.
Detailed guides on context management highlight this exact failure mode: as sessions grow, models become increasingly sensitive to irrelevant or redundant tokens, which degrade output quality. Platform-level documentation also reinforces the same principle - effective systems explicitly control how context is introduced, retained, and pruned.
In practice, this shows up as inconsistency. But underneath, it’s the predictable outcome of unmanaged context growth.
The Link Between Context Rot and Hallucination
This is where teams often misdiagnose the issue.
When agents hallucinate, the instinct is to blame the model. But hallucination is often downstream of context rot.
OpenAI’s work on hallucinations explains that models are optimized to produce plausible outputs even under uncertainty. When context degrades, uncertainty increases. The model fills gaps with statistically likely answers.
So the failure chain looks like this:
Context degradation → ambiguity → confident guessing → hallucination
In other words, hallucination is not always a knowledge problem.
It is often a context management problem.
Sessions Are Not Conversations
Most developers interact with AI through chat, so they treat sessions like conversations.
That mental model breaks at scale.
A long-running AI session is not a conversation. It is a stateful system.
And like any stateful system, it degrades without control.
Letting context accumulate indefinitely is equivalent to running a system without memory management. Eventually, performance collapses—not because the system is incapable, but because it is overloaded.
The Plan → Execute → Reset Discipline
Once you accept that context degrades, the solution becomes straightforward: you don’t try to out-prompt the problem. You control how context evolves.
Across production teams, a consistent pattern emerges:
Plan → Execute → Reset
This is not a trick. It is operational discipline.
Planning before execution
The most common mistake is asking the agent to write code immediately. This forces premature decisions and locks the model into an approach before it has fully understood the problem.
Instead, enforce a planning phase.
Have the model break down the task, identify dependencies, and surface uncertainties before implementation. This aligns closely with best practices in production-grade prompt engineering, where structured reasoning is prioritized over immediate generation.
Planning reduces unnecessary context growth and prevents incorrect assumptions from propagating.
Stepwise execution
Once the plan is validated, execution should be incremental.
Large, monolithic prompts create large, monolithic contexts—and those degrade fastest.
Stepwise execution keeps the working context focused. Each step introduces only the information required for that step. Errors are caught early, before they spread across the system.
This is not about slowing down development. It is about maintaining signal integrity.
Resetting the session
Even with disciplined execution, context will eventually degrade.
The only reliable solution is to reset.
This may feel inefficient, but in practice, it is one of the highest-leverage actions you can take. A fresh session restores clarity, removes noise, and re-establishes correct prioritization of instructions.
Modern context management approaches consistently emphasize this: keep context bounded, and reintroduce only what is necessary for the task at hand.
Meta-Prompting: Forcing the Model to Think First
One of the most effective techniques for preventing context rot is meta-prompting.
Instead of telling the model what to do, you tell it how to approach the task.
You explicitly require it to:
- identify assumptions
- highlight uncertainties
- ask clarifying questions
This interrupts the model’s default behavior of immediate generation.
Why does this work?
Because hallucinations are often driven by premature certainty. Meta-prompting introduces friction at exactly the right point—before incorrect assumptions become embedded in the context.
Checkpoints: Turning Drift into Signal
Context rot is dangerous because it is gradual and often invisible.
Checkpoints make it observable.
At key moments, you force the model to validate its output against:
- the original task
- repository constraints (AGENTS.md)
- architectural expectations
This transforms hidden drift into explicit feedback.
Instead of discovering problems at the end, you correct them continuously.
The Connection to Part 1
Part 1 of the series solved the problem of what the agent sees.
Part 2 addresses what happens over time.
AGENTS.md provides structure.
Session discipline preserves that structure.
Without AGENTS.md, the agent guesses.
Without discipline, the agent drifts.
You need both to achieve reliable outcomes.
Why This Matters Now
As teams move from experimentation to production, sessions become longer and more complex. Agents interact with more systems, touch more code, and accumulate more context.
This is where most failures emerge.
Not because the model is incapable, but because the workflow is uncontrolled.
Context rot is the primary bottleneck in real-world AI engineering today.
Before You Scale, You Stabilize
In Part 3, we turn to a different problem.
So far, we have focused on a single agent operating within a controlled session. That constraint makes it possible to reason about context, to reset it, and to keep it aligned with the task.
But most real systems do not stay within that boundary.
As soon as you introduce multiple agents, external tools, or retrieval systems, the problem changes. Context is no longer contained in a single session. It becomes distributed across components that do not share the same state or assumptions.
At that point, failures become harder to trace. Drift is no longer local. It propagates.
This is where orchestration becomes necessary, but also where it becomes risky.
Part 3 explores how to build these systems in a way that preserves the guarantees established here. We will look at how to introduce MCPs, subagents, and external integrations without losing control over context, consistency, or behavior.

