Phase 1, Session 1: The Context Window

The Big Idea

Every time you talk to Claude, Claude doesn't "remember" your previous conversations. It doesn't have a brain that stores memories. Instead, it has a context window — a fixed-size workspace where everything Claude can "see" right now lives.

Think of it as a desk. Everything Claude needs to do its job has to fit on this desk. If the desk fills up, older stuff gets pushed off the edge. When a new session starts, the desk is cleared and reset.

Analogy

Imagine you're a contractor who gets amnesia every night. Each morning, someone hands you a briefcase. Inside: your company manual (CLAUDE.md), notes from yesterday's meeting (HANDOFF.md), the client's specifications (supplements), and today's task list (the conversation). Everything you know about the job has to fit in that briefcase. If something isn't in the briefcase, you don't know it. Period.

What's In the Context Window?

Click any layer to see what's inside and why it matters.

1. System Prompt (Anthropic's instructions) ~4,000 tokens

What: Instructions from Anthropic that tell Claude how to behave — safety rules, formatting guidelines, knowledge cutoff date, and what tools are available.

You control this? No. This is baked in by Anthropic. You never see it, but it's always there, taking up space on the desk.

Why it matters: This is why Claude behaves similarly across all users. The system prompt is the "factory settings." Your CLAUDE.md is your customization on top of these defaults.

In Cowork: The system prompt also includes your available skills list, connected MCP tool descriptions, file handling instructions, and any personal preferences you've set.

click to expand

2. CLAUDE.md (Your boot context) ~3,500 tokens

What: Your custom instructions. Identity, rules, knowledge map, team structure, tool gotchas, shorthand commands. This is loaded AUTOMATICALLY at the start of every session.

You control this? Yes, 100%. This is your most powerful tool.

Why it matters: Every token in CLAUDE.md is a tax you pay on EVERY interaction. A 3,500-token CLAUDE.md means Claude has 3,500 fewer tokens available for actual work in every single message. This is why "lean boot context" is a conservation measure — not because the content doesn't matter, but because it costs you on every turn.

Example setup: A main CLAUDE.md with @-imported rules files (hard-rules.md, session-protocol.md, team-structure.md). Modular design keeps it organized without bloat — each file has one theme.

click to expand

3. @-Imported Rules Files ~2,000 tokens

What: Separate .md files pulled in by the @.claude/rules/ references in CLAUDE.md. Hard rules, session protocol, team structure, marketing rules.

You control this? Yes. This is an extension of CLAUDE.md — same power, better organization.

Why it matters: By splitting rules into separate files, you can update one area (say, marketing rules) without touching the core CLAUDE.md. But they ALL load at boot, so they ALL count against your token budget.

Key insight: Claude reads these top-to-bottom. Rules at the top of a file get slightly more "attention weight" than rules buried at the bottom. Put your most critical rules first.

click to expand

4. Supplements (loaded at startup) ~2,500 tokens

What: Product-specific context files loaded during the session startup protocol. Example: an acme-product-supplement.md that holds current sprint state, pricing details, and feature flags — things that change often and don't belong in the main CLAUDE.md.

You control this? Yes, but these are manually loaded (session protocol says "read supplements at startup"), not auto-loaded like CLAUDE.md.

Why it matters: This is the "Layer 2" of context — product details, current state, recent changes. Keeps CLAUDE.md lean (company-level) while supplements carry product-level detail.

The tradeoff: More supplement = more context = better answers about that product. But also = fewer tokens left for conversation. Finding the right balance is context engineering.

click to expand

5. Conversation History grows with each message

What: Every message you've sent and every response Claude has given in THIS session. Plus every tool call and its results.

You control this? Indirectly. Longer conversations = more history = more tokens consumed.

Why it matters: This is why sessions get "dumber" over time. Early in a session, most of the context window is available for thinking. Late in a session, the window is packed with conversation history, leaving less room. This is also why compaction happens — when the window fills up, older conversation gets summarized to free space.

The hidden cost: Tool calls are conversation history too. Every search_docs result, every mac-shell output, every file read — it ALL goes into the context window. A single large file read can consume thousands of tokens.

Common experience: "Why does Claude seem to forget rules by the end of a long session?" The rules (CLAUDE.md) are still there — but they're now competing with 50,000+ tokens of conversation for Claude's attention. The rules didn't disappear. They just got crowded out.

click to expand

6. Your Current Message varies

What: The message you just typed. This is the most recent thing in the context window, which means it gets the highest "attention."

You control this? Completely.

Why it matters: Recency bias is real. Claude pays more attention to recent messages than older ones. This is why repeating an important instruction in your message ("remember, no em dashes") works even if it's already in CLAUDE.md — you're putting it at the point of highest attention.

Pro tip: If Claude is drifting from your rules late in a session, restating the key rule in your message is more effective than hoping Claude is still paying attention to line 47 of hard-rules.md.

click to expand

7. Claude's Thinking + Response Space whatever's left

What: The remaining space in the context window where Claude actually does its work — reasoning through the problem, planning tool calls, composing the response.

You control this? Indirectly — by managing everything above, you determine how much room is left here.

Why it matters: This is the ACTUAL workspace. Everything above is setup. If your CLAUDE.md is 5,000 tokens, your supplements are 3,000, and your conversation history is 80,000... Claude has less room to think. Quality degrades not because Claude is "tired" but because the workspace is cramped.

This is why token conservation matters: It's not just about billing. Less token waste = more thinking room = better output.

click to expand

See It In Action: Token Budget

Click a scenario to see how the context window fills up differently depending on the situation.

0 tokens 200,000 tokens

System Prompt

CLAUDE.md + Rules

Supplements

Conversation

Your Message

Thinking Space

Fresh Session: Most of the window is empty — available for conversation and thinking. This is when Claude is at its best. CLAUDE.md and rules are loaded, supplements read during startup protocol. Claude has maximum room to think.

Context Window Sizes — What You're Working With

Model	Context Window	Roughly...
Claude Opus 4.6	200,000 tokens	~150,000 words or ~500 pages
Claude Sonnet 4.6	200,000 tokens	Same size, faster, cheaper
Claude Haiku 4.5	200,000 tokens	Same size, fastest, cheapest
GPT-4o (OpenAI)	128,000 tokens	~64% of Claude's window
Gemini 1.5 Pro (Google)	1,000,000+ tokens	5x Claude — but accuracy drops at the edges

Key insight: Bigger isn't always better. Google's Gemini can hold 1M tokens, but research shows accuracy drops significantly for information in the middle of very long contexts (the "lost in the middle" problem). Claude's 200K window with good context engineering often outperforms a 1M window stuffed with everything.

Why Claude "Forgets" — Three Different Things People Call Forgetting

1. Session Boundary (the hard reset)

When a session ends and a new one starts, the context window is completely cleared. Claude doesn't carry anything over. This is why CLAUDE.md exists — it's the only thing that automatically reloads. Everything else (supplements, conversation, decisions made) must be explicitly preserved through files (HANDOFF.md, KB, session-memory.md).

2. Compaction (the emergency summary)

When the context window fills up during a session, the system automatically summarizes older conversation to free space. This is lossy — details, nuance, and specific instructions from early in the session may be simplified or lost. This is what happened earlier today when our session compacted and you got that "continued from previous conversation" summary.

3. Attention Fade (the gradual drift)

Even within a session where nothing is lost, Claude's "attention" to instructions degrades as the conversation grows. CLAUDE.md instructions that were perfectly followed at message 3 may be inconsistently followed at message 30 — not because they're gone, but because they're now competing with 50,000+ tokens of newer content for attention weight. This is the "laziness" you've been experiencing, compounded by the Opus 4.6 regression.

This is why you write everything down. Decision logs, session summaries, knowledge base entries — none of this is bureaucracy. It's survival. If it's not written somewhere Claude can find it next session, it functionally doesn't exist.

What This Means for Your Setup

Good Claude architecture is context window management in disguise:

Architecture Decision	Why It Works (Context Window Reason)
Modular CLAUDE.md with @-imports	Keeps boot context organized without wasting tokens on section headers and navigation
Separate sessions for different roles	Each session gets its own fresh 200K window instead of one session trying to hold everything
Knowledge base with on-demand retrieval	Hundreds of docs NOT loaded at boot — only retrieved when needed. Massive savings vs. stuffing everything into CLAUDE.md
Supplements separate from CLAUDE.md	Company-level context always loaded (cheap). Product-level loaded per session (only when needed)
Periodic checkpoint rule	Forces doc updates before the window gets too full to act on them reliably
Session close-out protocol	Writes critical state to files BEFORE compaction can erase it from the window
"Write it down or it doesn't exist"	Anything only in the context window dies at session end. Files are permanent.

Phase 1, Session 1

The Big Idea

What's In the Context Window?

See It In Action: Token Budget

Context Window Sizes — What You're Working With

Why Claude "Forgets" — Three Different Things People Call Forgetting

1. Session Boundary (the hard reset)

2. Compaction (the emergency summary)

3. Attention Fade (the gradual drift)

What This Means for Your Setup

Next: Phase 1, Session 2