NEW: Claude Code Security — research preview

Extended Context & ECC

How large context windows change agent behavior, token economics, and what ECC means for agentic workflows

Read time: 10 min

title: "Extended Context & ECC" description: "How large context windows change agent behavior, token economics, and what ECC means for agentic workflows" section: "Ecosystem" readTime: "10 min"

Extended Context & ECC

Context window size determines what an agent can "see" at once. Understanding how context works — and how to use it efficiently — is one of the highest-leverage skills for AI-assisted development.

What Is the Context Window?

The context window is the total amount of text (measured in tokens) that the model can process in a single request. Everything in the window — your conversation history, files, instructions, tool results — competes for this space.

Rough token estimates:

ContentApproximate Tokens
1 line of code5–15 tokens
1 file (100 lines)500–1,500 tokens
Full README1,000–5,000 tokens
Small codebase (20 files)10,000–30,000 tokens
Large codebase (200 files)100,000–300,000 tokens

Context Windows in 2025

ModelContext Window
Claude Sonnet 4.5 / Opus 4200,000 tokens (~150K words)
GPT-4o128,000 tokens
Gemini 2.5 Pro1,000,000 tokens (~750K words)
Gemini 2.5 Flash1,000,000 tokens
DeepSeek V364,000 tokens

Extended Context Caching (ECC)

Extended Context Caching (ECC) is Anthropic's mechanism for reusing prompt prefixes across requests. When a large portion of your prompt stays the same between calls (e.g., a large codebase), ECC stores it on Anthropic's servers and charges a much lower cache read price instead of the full input price.

ECC Cost Breakdown

OperationPrice vs. Normal Input
Cache write (first call)25% more expensive
Cache read (subsequent calls)90% cheaper

Example savings: Loading a 50,000-token codebase context into Claude for 10 agent iterations:

  • Without ECC: 500,000 tokens × input price
  • With ECC: 50,000 tokens cache write + 450,000 tokens cache read → ~80% cheaper

Enabling ECC in Claude Code

ECC is enabled by default in Claude Code for CLAUDE.md and imported files. For custom usage:

// Direct API usage with ECC
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 4096,
  system: [
    {
      type: "text",
      text: largeCodebaseContext,
      cache_control: { type: "ephemeral" }  // Enable ECC for this block
    }
  ],
  messages: [{ role: "user", content: userPrompt }]
});

CLAUDE.md and ECC

Your CLAUDE.md file is always included in the context prefix. With ECC enabled, it's cached after the first call and reused cheaply for the entire session.

Keep your CLAUDE.md comprehensive — the ECC savings make it nearly free after the first read.

How Context Affects Agent Behavior

The "Lost in the Middle" Problem

Models process context unevenly. Information at the start and end of the context window gets the most attention. Information buried in the middle of a long context is more likely to be ignored.

[High attention]  System prompt, CLAUDE.md
[Medium attention] Middle of conversation
[Low/medium attention] Code files added mid-session
[High attention]  Most recent message

Mitigation: Put critical constraints in the system prompt (CLAUDE.md), not as mid-conversation notes.

Context Compaction in Claude Code

When the context fills up, Claude Code automatically compacts it: it summarizes earlier conversation turns into a dense summary, freeing space for new content. You can observe this with:

# Show context usage
claude --verbose "your task here"

When compaction happens, some details from earlier in the session may be lost. For long sessions:

# Checkpoint: ask Claude to summarize its progress before compaction
claude "Summarize what we've accomplished and what files have been changed, 
then continue with the next step."

Optimizing Context Usage

What to Include

ContentInclude?Why
CLAUDE.md / project contextAlwaysFree after ECC; critical for accuracy
Directly relevant source filesYesEnable accurate edits
Test files for the moduleYesGround truth for behavior
Unrelated modulesNoWastes context, increases cost
Entire node_modulesNeverMillions of tokens, no value

Use @file Sparingly in Cursor

Cursor's @file reference includes the full file content in context. For large files:

# Inefficient — includes entire 3,000-line router file
@src/routes/index.ts Fix the auth middleware

# Better — scope to relevant function
@src/routes/auth.ts lines 45-89 Fix the token validation

Context Pruning in Claude Code

# Start a fresh session for unrelated tasks (don't carry old context)
# Each `claude` invocation starts fresh unless you use --resume
 
# Resume a specific session
claude --resume <session-id>
 
# List recent sessions
claude sessions list

Large Codebase Strategies

Repo Map (aider approach)

Rather than loading all files, load a "map" — just the file paths, class names, and function signatures:

# Generate a repo map
find src/ -name "*.ts" | xargs grep -l "export" | \
  xargs grep "^export (function|class|const|interface)" | \
  sed 's/:export.*//' > /tmp/repo-map.txt
 
claude "Using this repository map for context, find which modules handle authentication:
$(cat /tmp/repo-map.txt)"

Chunking for Gemini 2.5

With Gemini's 1M token window, you can genuinely fit entire large repos:

# Concatenate entire src/ tree
find src/ -name "*.ts" -o -name "*.tsx" | sort | \
  xargs -I {} bash -c 'echo "=== {} ==="; cat {}' > /tmp/full-context.ts
 
# Feed to Gemini API
cat /tmp/full-context.ts | gemini-cli "Analyze all inter-module dependencies 
and identify the top 5 architectural coupling issues"

Checklist