Context Window Mechanics
Understanding how context windows work — not just that they’re limited — enables you to make better engineering decisions across any AI coding agent (Claude Code, Cursor, Copilot Workspace, and others).
Transformer Attention
Section titled “Transformer Attention”In transformer architectures, each token attends to every other token in the context window. This creates n² pairwise relationships. As context grows:
- Each token’s “share” of attention decreases
- The model must spread its attention budget across more relationships
- Earlier tokens receive progressively less attention
- Instructions at the beginning of context can be “diluted” by later content
This is why a 200k-token model doesn’t simply work “200k tokens well.” Performance varies dramatically across the window.
The Recency Bias
Section titled “The Recency Bias”Models exhibit strong recency bias — they attend more strongly to recent tokens. This has practical implications:
- Instructions near the end of context are followed more reliably
- The system prompt (beginning of context) can be overridden by later content
- Recent file reads have more influence than earlier ones
- The last correction you give matters more than the first
What Auto-Compaction Does
Section titled “What Auto-Compaction Does”When an AI coding agent hits 95% context utilization, it typically triggers auto-compaction:
- The full conversation history is passed to a summarization model
- The model preserves: architectural decisions, unresolved issues, implementation details, modified file list
- The model discards: redundant tool outputs, duplicate messages, resolved discussions
- The conversation continues with the compressed summary
What survives compaction:
- Key decisions and their rationale
- Current task state and progress
- File modification history
- Active constraints and requirements
What typically doesn’t survive:
- Exact code snippets from earlier reads
- Build/test output details
- The full text of earlier discussion
- Nuanced instructions from early in the conversation
Customizing Compaction
Section titled “Customizing Compaction”You can influence what compaction preserves via your agent configuration file:
## Compaction InstructionsWhen compacting, always preserve:- The full list of modified files- All test commands that have been run- The current implementation plan and progress- Any architectural decisions made during this sessionIn tools that support manual compaction, you can trigger it directly with specific instructions. See Tool Configuration Reference for your tool’s compact command.
Compact your context. Focus on the API migration changes. Preserve the file list,migration sequence, and remaining steps. Discard exploration output.The Selective Compaction Pattern
Section titled “The Selective Compaction Pattern”Some AI coding agents support selective compaction from a checkpoint rather than compacting the entire conversation. This condenses messages from a selected point forward while keeping earlier context intact — useful when exploration filled the context but you want to preserve the initial plan. Check your tool’s documentation for equivalent checkpoint or session-management features.
Practical Guidelines
Section titled “Practical Guidelines”| Situation | Action |
|---|---|
| Starting a new task | Clear your context or start a fresh session for a clean start |
| Between research and planning | Compact your context, preserving research findings |
| Between planning and implementation | Compact your context, preserving the plan |
| After fixing a bug | Clear your context if moving to unrelated work |
| Context at 60% during complex task | Consider compacting proactively |
| Context at 80%+ | Compact immediately or start fresh |
| After 2+ failed correction attempts | Clear your context and start with a better prompt |