Anatomy of a Context Window
Anatomy of a Context Window
A context window is the total amount of text an LLM can process in a single request-response cycle, measured in tokens. AI models like Claude Sonnet 3.5 or GPT-4 read everything in the context window to generate each response, but not all content has equal influence.
What Fills the Context Window
The context window contains several types of content:
System prompt: Instructions that define the model's behavior and capabilities. These persist across the entire conversation and typically consume 5-20% of available context.
Conversation history: All previous messages in the thread, including both user inputs and model responses. This grows with each exchange.
Tool outputs: Results from function calls, file reads, web fetches, or database queries. A single read command on a large file can consume thousands of tokens.
Injected context: Documentation, code snippets, or reference material added to support the current task.
How Context Affects Behavior
As the context window fills, model behavior shifts. The final 20-30% of the context—recent messages and tool outputs—has disproportionate influence on the response due to recency bias. Content earlier in the window still informs the model but carries less weight in decision-making.
When context nears capacity, models must compress or truncate earlier conversation to fit new information. This compression can cause the model to lose track of earlier instructions or context.
Understanding context anatomy explains why progressive disclosure and giving agents the map first improve performance: they optimize what enters the context window and when.
Resources
- Anthropic's Context Window Documentation - Strategies for working with long context
- OpenAI's Token Counting Guide - Tool for understanding token usage
- What is a Token? - Foundational concept for understanding context measurement