Context Window Mechanics

Understanding how context windows work — not just that they’re limited — enables you to make better engineering decisions across any AI coding agent (Claude Code, Cursor, Copilot Workspace, and others).

Transformer Attention

In transformer architectures, each token attends to every other token in the context window. This creates n² pairwise relationships. As context grows:

Each token’s “share” of attention decreases
The model must spread its attention budget across more relationships
Earlier tokens receive progressively less attention
Instructions at the beginning of context can be “diluted” by later content

This is why a 200k-token model doesn’t simply work “200k tokens well.” Performance varies dramatically across the window.

The Recency Bias

Models exhibit strong recency bias — they attend more strongly to recent tokens. This has practical implications:

Instructions near the end of context are followed more reliably
The system prompt (beginning of context) can be overridden by later content
Recent file reads have more influence than earlier ones
The last correction you give matters more than the first

What Auto-Compaction Does

When an AI coding agent hits 95% context utilization, it typically triggers auto-compaction:

The full conversation history is passed to a summarization model
The model preserves: architectural decisions, unresolved issues, implementation details, modified file list
The model discards: redundant tool outputs, duplicate messages, resolved discussions
The conversation continues with the compressed summary

What survives compaction:

Key decisions and their rationale
Current task state and progress
File modification history
Active constraints and requirements

What typically doesn’t survive:

Exact code snippets from earlier reads
Build/test output details
The full text of earlier discussion
Nuanced instructions from early in the conversation

Customizing Compaction

You can influence what compaction preserves via your agent configuration file:

## Compaction Instructions
When compacting, always preserve:
- The full list of modified files
- All test commands that have been run
- The current implementation plan and progress
- Any architectural decisions made during this session

In tools that support manual compaction, you can trigger it directly with specific instructions. See Tool Configuration Reference for your tool’s compact command.

Compact your context. Focus on the API migration changes. Preserve the file list,
migration sequence, and remaining steps. Discard exploration output.

The Selective Compaction Pattern

Some AI coding agents support selective compaction from a checkpoint rather than compacting the entire conversation. This condenses messages from a selected point forward while keeping earlier context intact — useful when exploration filled the context but you want to preserve the initial plan. Check your tool’s documentation for equivalent checkpoint or session-management features.

Practical Guidelines

Situation	Action
Starting a new task	Clear your context or start a fresh session for a clean start
Between research and planning	Compact your context, preserving research findings
Between planning and implementation	Compact your context, preserving the plan
After fixing a bug	Clear your context if moving to unrelated work
Context at 60% during complex task	Consider compacting proactively
Context at 80%+	Compact immediately or start fresh
After 2+ failed correction attempts	Clear your context and start with a better prompt