Phase 1, Session 3

Tokens: The Real Currency — What Claude Actually Counts

The Big Idea

When people hear "200,000 token context window," they usually think in words or sentences. That's the wrong unit. Tokens are the actual unit Claude thinks in — and they don't map cleanly to words, sentences, or characters.

Understanding tokens explains why some operations feel expensive, why code costs more than prose, why other languages burn through context faster, and why "context window" is not the same as "how much Claude can read."

Analogy
Tokens are like syllables, not words. "Cat" is one token. "Catastrophically" might be three. "Pneumonoultramicroscopicsilicovolcanoconiosis" is many more. The model doesn't see words — it sees these sub-word chunks. Some map to whole common words, some to word fragments, some to punctuation, some to whitespace. The vocabulary was chosen to be efficient for English, which means it's less efficient for other languages and code.

See Tokens in Real Time

Type anything below and watch it get split into tokens. Each color is a different token.

0
Tokens
0
Words
0
Characters
0.0
Tokens per word
Key rule of thumb: For everyday English prose, 1 token ≈ 0.75 words (or about 4 characters). So 200,000 tokens ≈ 150,000 words ≈ about 500 pages of a novel. But this breaks down fast with code, special characters, and non-English text.

The Math: What 200K Tokens Actually Gets You

Content TypeTokens per unitWhat fits in 200K tokens
Average English word~1.3 tokens~150,000 words
Novel page (250 words)~330 tokens~600 pages (a long novel)
This lesson (full text)~2,500 tokens80 lessons like this one
Tweet (280 chars)~80 tokens~2,500 tweets
Python function (20 lines)~150-250 tokens~1,000 small functions
JSON API response (large)~2,000-5,000 tokens40-100 API calls
PDF page (dense text)~400-600 tokens~350-500 pages
Spanish/French text~1.5-2x EnglishRoughly half the English capacity
Chinese/Japanese/Arabic~2-4x EnglishA quarter or less of English capacity
The non-English problem: Claude's tokenizer was optimized for English. A paragraph in Japanese might cost 3-4x the tokens of the same meaning in English. If you're building for international users, this has real cost and context implications.

Why Code Costs More Than It Looks

Code is tokenized differently from prose. Variable names, symbols, whitespace, and syntax all fragment into more tokens than equivalent English text.

Text TypeExampleTokens
Plain English"Get all users from the database"~8
SQL equivalentSELECT * FROM users WHERE active = 1;~15
Python equivalentusers = db.query(User).filter(User.active==1).all()~22
JSON config (indented)4-space indented JSON object, 10 fields~60-100
Markdown with headers# Heading / **bold** / - list items~10-15% overhead vs plain text
Practical takeaway: When Claude reads a large file (code, JSON config, tool output), that read goes into the context window as tokens. A 500-line Python file might consume 8,000-12,000 tokens. If Claude reads three files in a session, you've burned 25,000-35,000 tokens just on file reads before any thinking happens.

The Hidden Token Costs

Most people only think about message length. The real costs are often invisible.

Every tool call result (file read, search result, API response) 500 - 5,000 tokens each
Claude's thinking output (extended thinking mode) 1,000 - 20,000 tokens per response
System prompt (Anthropic's baseline) ~4,000 tokens every turn
Your CLAUDE.md + rules files ~6,000-10,000 tokens every turn
Entire conversation history (re-read every turn) grows by 1,000-5,000 per exchange
Large image attached to a message ~1,000-1,700 tokens per image
Screenshot from computer-use tool ~1,000-2,000 tokens per screenshot
The compounding effect: Every turn in a conversation, Claude re-reads the ENTIRE context window — system prompt, CLAUDE.md, every previous message, every tool result. It's not reading just your latest message. It reads everything, every time. This is why longer sessions cost exponentially more tokens than short ones.

Visualize Your Session Budget

Use the slider to set how many tools calls and messages are in your session. See where the tokens actually go.

Session length: 20 exchanges ~1 hour of work
System
CLAUDE.md
Tool results
Conversation
Free space
System prompt (~4K)
CLAUDE.md + rules (~8K)
Tool call results
Conversation history
Free for thinking

Tokens Used

24,000
of 200,000

% Window Full

12%
Fresh. Claude's at full capacity.

Free Thinking Space

176K
tokens available

Compaction Risk

Low
at current session length

Tokens = Money (What It Actually Costs)

Anthropic charges by token on the API. The consumer plans bundle tokens into monthly limits. Either way, tokens are the unit.

ModelInput tokensOutput tokensWhen to use
Claude Opus 4.6$15 / 1M tokens$75 / 1M tokensComplex reasoning, critical decisions
Claude Sonnet 4.6$3 / 1M tokens$15 / 1M tokensMost tasks — the smart default
Claude Haiku 4.5$0.80 / 1M tokens$4 / 1M tokensHigh-volume, simple tasks
The math that matters: A single long Opus session with lots of tool calls (say, 150K tokens in + 20K tokens out) costs about $3.75 in API terms. Multiply that by multiple sessions per day and it adds up fast. This is why "Sonnet for everything, Opus only when you actually need it" is the right operating principle — not cheapness, but efficiency. Sonnet at 20% of Opus cost produces 80-90% of the quality for most tasks.

Output tokens cost more than input — here's why

When Claude generates output, it's doing computation for every single token — predicting the next most likely token, one at a time. Reading input is comparatively cheap (one forward pass through the model). Writing output requires generating tokens sequentially, which is why a long response costs significantly more than a long input to read.

This explains "be concise" at a mechanical level. If you ask Claude to "explain everything in detail," that output is 5-10x more expensive than a concise answer. Not just slower to read — more expensive to generate. For your own use, this matters less. For anyone building with the API at scale, it's critical.

Token Conservation in Practice

With this mental model, conservation strategies make intuitive sense:

StrategyWhy It Works (Token Reason)Savings
Use Sonnet instead of Opus Same token count, 80% cost reduction. Doesn't reduce tokens used, but reduces cost per token. $$$
KB retrieval vs. boot-context loading search_docs returns ~200-500 tokens of targeted info. Loading the entire doc into CLAUDE.md would cost that every single turn. High
Separate sessions for separate roles Each session has its own clean context. No cross-contamination. No need to re-explain role A to session B. Medium
Short, specific messages Input is cheap, but also sets up a longer response. Vague prompts produce longer, less useful responses, costing more output tokens. Low-Medium
Read specific file sections, not whole files Reading lines 200-250 of a file costs ~500 tokens. Reading the whole file might cost 8,000. High (for large files)
Start fresh sessions for new topics A new session has minimal conversation history. Continuing in an old session means Claude re-reads everything from 3 hours ago every turn. High (late sessions)

The Number That Actually Matters

People obsess over context window size. The number that actually determines quality is:

Available thinking space
=
200,000 (total window)
System prompt + CLAUDE.md + supplements + conversation history + tool results
=
~176K tokens
at the start of a fresh session

Everything you do to reduce the non-thinking parts is directly invested into Claude's ability to reason. Lean CLAUDE.md, targeted retrieval, short sessions — these aren't style preferences. They're how you buy yourself more thinking space per dollar.

Next: Phase 1, Session 4

Why Claude "Forgets" — Not a bug. A precise architectural reality. The three types of forgetting, what compaction really does, and what you can do about each one.

We'll look at what actually survives compaction and what doesn't — and why the answer might surprise you.