Extended Context & ECC
How large context windows change agent behavior, token economics, and what ECC means for agentic workflows
title: "Extended Context & ECC" description: "How large context windows change agent behavior, token economics, and what ECC means for agentic workflows" section: "Ecosystem" readTime: "10 min"
Extended Context & ECC
Context window size determines what an agent can "see" at once. Understanding how context works — and how to use it efficiently — is one of the highest-leverage skills for AI-assisted development.
What Is the Context Window?
The context window is the total amount of text (measured in tokens) that the model can process in a single request. Everything in the window — your conversation history, files, instructions, tool results — competes for this space.
Rough token estimates:
| Content | Approximate Tokens |
|---|---|
| 1 line of code | 5–15 tokens |
| 1 file (100 lines) | 500–1,500 tokens |
| Full README | 1,000–5,000 tokens |
| Small codebase (20 files) | 10,000–30,000 tokens |
| Large codebase (200 files) | 100,000–300,000 tokens |
Context Windows in 2025
| Model | Context Window |
|---|---|
| Claude Sonnet 4.5 / Opus 4 | 200,000 tokens (~150K words) |
| GPT-4o | 128,000 tokens |
| Gemini 2.5 Pro | 1,000,000 tokens (~750K words) |
| Gemini 2.5 Flash | 1,000,000 tokens |
| DeepSeek V3 | 64,000 tokens |
Extended Context Caching (ECC)
Extended Context Caching (ECC) is Anthropic's mechanism for reusing prompt prefixes across requests. When a large portion of your prompt stays the same between calls (e.g., a large codebase), ECC stores it on Anthropic's servers and charges a much lower cache read price instead of the full input price.
ECC Cost Breakdown
| Operation | Price vs. Normal Input |
|---|---|
| Cache write (first call) | 25% more expensive |
| Cache read (subsequent calls) | 90% cheaper |
Example savings: Loading a 50,000-token codebase context into Claude for 10 agent iterations:
- Without ECC: 500,000 tokens × input price
- With ECC: 50,000 tokens cache write + 450,000 tokens cache read → ~80% cheaper
Enabling ECC in Claude Code
ECC is enabled by default in Claude Code for CLAUDE.md and imported files. For custom usage:
// Direct API usage with ECC
const response = await anthropic.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 4096,
system: [
{
type: "text",
text: largeCodebaseContext,
cache_control: { type: "ephemeral" } // Enable ECC for this block
}
],
messages: [{ role: "user", content: userPrompt }]
});CLAUDE.md and ECC
Your CLAUDE.md file is always included in the context prefix. With ECC enabled, it's cached after the first call and reused cheaply for the entire session.
Keep your CLAUDE.md comprehensive — the ECC savings make it nearly free after the first read.
How Context Affects Agent Behavior
The "Lost in the Middle" Problem
Models process context unevenly. Information at the start and end of the context window gets the most attention. Information buried in the middle of a long context is more likely to be ignored.
[High attention] System prompt, CLAUDE.md
[Medium attention] Middle of conversation
[Low/medium attention] Code files added mid-session
[High attention] Most recent message
Mitigation: Put critical constraints in the system prompt (CLAUDE.md), not as mid-conversation notes.
Context Compaction in Claude Code
When the context fills up, Claude Code automatically compacts it: it summarizes earlier conversation turns into a dense summary, freeing space for new content. You can observe this with:
# Show context usage
claude --verbose "your task here"When compaction happens, some details from earlier in the session may be lost. For long sessions:
# Checkpoint: ask Claude to summarize its progress before compaction
claude "Summarize what we've accomplished and what files have been changed,
then continue with the next step."Optimizing Context Usage
What to Include
| Content | Include? | Why |
|---|---|---|
| CLAUDE.md / project context | Always | Free after ECC; critical for accuracy |
| Directly relevant source files | Yes | Enable accurate edits |
| Test files for the module | Yes | Ground truth for behavior |
| Unrelated modules | No | Wastes context, increases cost |
| Entire node_modules | Never | Millions of tokens, no value |
Use @file Sparingly in Cursor
Cursor's @file reference includes the full file content in context. For large files:
# Inefficient — includes entire 3,000-line router file
@src/routes/index.ts Fix the auth middleware
# Better — scope to relevant function
@src/routes/auth.ts lines 45-89 Fix the token validation
Context Pruning in Claude Code
# Start a fresh session for unrelated tasks (don't carry old context)
# Each `claude` invocation starts fresh unless you use --resume
# Resume a specific session
claude --resume <session-id>
# List recent sessions
claude sessions listLarge Codebase Strategies
Repo Map (aider approach)
Rather than loading all files, load a "map" — just the file paths, class names, and function signatures:
# Generate a repo map
find src/ -name "*.ts" | xargs grep -l "export" | \
xargs grep "^export (function|class|const|interface)" | \
sed 's/:export.*//' > /tmp/repo-map.txt
claude "Using this repository map for context, find which modules handle authentication:
$(cat /tmp/repo-map.txt)"Chunking for Gemini 2.5
With Gemini's 1M token window, you can genuinely fit entire large repos:
# Concatenate entire src/ tree
find src/ -name "*.ts" -o -name "*.tsx" | sort | \
xargs -I {} bash -c 'echo "=== {} ==="; cat {}' > /tmp/full-context.ts
# Feed to Gemini API
cat /tmp/full-context.ts | gemini-cli "Analyze all inter-module dependencies
and identify the top 5 architectural coupling issues"