Phase 1, Session 3: Tokens — The Real Currency

The Big Idea

When people hear "200,000 token context window," they usually think in words or sentences. That's the wrong unit. Tokens are the actual unit Claude thinks in — and they don't map cleanly to words, sentences, or characters.

Understanding tokens explains why some operations feel expensive, why code costs more than prose, why other languages burn through context faster, and why "context window" is not the same as "how much Claude can read."

Analogy

Tokens are like syllables, not words. "Cat" is one token. "Catastrophically" might be three. "Pneumonoultramicroscopicsilicovolcanoconiosis" is many more. The model doesn't see words — it sees these sub-word chunks. Some map to whole common words, some to word fragments, some to punctuation, some to whitespace. The vocabulary was chosen to be efficient for English, which means it's less efficient for other languages and code.

See Tokens in Real Time

Type anything below and watch it get split into tokens. Each color is a different token.

0

Tokens

0

Words

0

Characters

0.0

Tokens per word

Key rule of thumb: For everyday English prose, 1 token ≈ 0.75 words (or about 4 characters). So 200,000 tokens ≈ 150,000 words ≈ about 500 pages of a novel. But this breaks down fast with code, special characters, and non-English text.

The Math: What 200K Tokens Actually Gets You

Content Type	Tokens per unit	What fits in 200K tokens
Average English word	~1.3 tokens	~150,000 words
Novel page (250 words)	~330 tokens	~600 pages (a long novel)
This lesson (full text)	~2,500 tokens	80 lessons like this one
Tweet (280 chars)	~80 tokens	~2,500 tweets
Python function (20 lines)	~150-250 tokens	~1,000 small functions
JSON API response (large)	~2,000-5,000 tokens	40-100 API calls
PDF page (dense text)	~400-600 tokens	~350-500 pages
Spanish/French text	~1.5-2x English	Roughly half the English capacity
Chinese/Japanese/Arabic	~2-4x English	A quarter or less of English capacity

The non-English problem: Claude's tokenizer was optimized for English. A paragraph in Japanese might cost 3-4x the tokens of the same meaning in English. If you're building for international users, this has real cost and context implications.

Why Code Costs More Than It Looks

Code is tokenized differently from prose. Variable names, symbols, whitespace, and syntax all fragment into more tokens than equivalent English text.

Text Type	Example	Tokens
Plain English	"Get all users from the database"	~8
SQL equivalent	`SELECT * FROM users WHERE active = 1;`	~15
Python equivalent	`users = db.query(User).filter(User.active==1).all()`	~22
JSON config (indented)	4-space indented JSON object, 10 fields	~60-100
Markdown with headers	# Heading / bold / - list items	~10-15% overhead vs plain text

Practical takeaway: When Claude reads a large file (code, JSON config, tool output), that read goes into the context window as tokens. A 500-line Python file might consume 8,000-12,000 tokens. If Claude reads three files in a session, you've burned 25,000-35,000 tokens just on file reads before any thinking happens.

The Hidden Token Costs

Most people only think about message length. The real costs are often invisible.

Every tool call result (file read, search result, API response) 500 - 5,000 tokens each

Claude's thinking output (extended thinking mode) 1,000 - 20,000 tokens per response

System prompt (Anthropic's baseline) ~4,000 tokens every turn

Your CLAUDE.md + rules files ~6,000-10,000 tokens every turn

Entire conversation history (re-read every turn) grows by 1,000-5,000 per exchange

Large image attached to a message ~1,000-1,700 tokens per image

Screenshot from computer-use tool ~1,000-2,000 tokens per screenshot

The compounding effect: Every turn in a conversation, Claude re-reads the ENTIRE context window — system prompt, CLAUDE.md, every previous message, every tool result. It's not reading just your latest message. It reads everything, every time. This is why longer sessions cost exponentially more tokens than short ones.

Visualize Your Session Budget

Use the slider to set how many tools calls and messages are in your session. See where the tokens actually go.

Session length: 20 exchanges ~1 hour of work

System

CLAUDE.md

Tool results

Conversation

Free space

System prompt (~4K)

CLAUDE.md + rules (~8K)

Tool call results

Conversation history

Free for thinking

Tokens Used

24,000

of 200,000

% Window Full

12%

— Fresh. Claude's at full capacity.

Free Thinking Space

176K

tokens available

Compaction Risk

Low

at current session length

Tokens = Money (What It Actually Costs)

Anthropic charges by token on the API. The consumer plans bundle tokens into monthly limits. Either way, tokens are the unit.

Model	Input tokens	Output tokens	When to use
Claude Opus 4.6	$15 / 1M tokens	$75 / 1M tokens	Complex reasoning, critical decisions
Claude Sonnet 4.6	$3 / 1M tokens	$15 / 1M tokens	Most tasks — the smart default
Claude Haiku 4.5	$0.80 / 1M tokens	$4 / 1M tokens	High-volume, simple tasks

The math that matters: A single long Opus session with lots of tool calls (say, 150K tokens in + 20K tokens out) costs about $3.75 in API terms. Multiply that by multiple sessions per day and it adds up fast. This is why "Sonnet for everything, Opus only when you actually need it" is the right operating principle — not cheapness, but efficiency. Sonnet at 20% of Opus cost produces 80-90% of the quality for most tasks.

Output tokens cost more than input — here's why

When Claude generates output, it's doing computation for every single token — predicting the next most likely token, one at a time. Reading input is comparatively cheap (one forward pass through the model). Writing output requires generating tokens sequentially, which is why a long response costs significantly more than a long input to read.

This explains "be concise" at a mechanical level. If you ask Claude to "explain everything in detail," that output is 5-10x more expensive than a concise answer. Not just slower to read — more expensive to generate. For your own use, this matters less. For anyone building with the API at scale, it's critical.

Token Conservation in Practice

With this mental model, conservation strategies make intuitive sense:

Strategy	Why It Works (Token Reason)	Savings
Use Sonnet instead of Opus	Same token count, 80% cost reduction. Doesn't reduce tokens used, but reduces cost per token.	$$$
KB retrieval vs. boot-context loading	search_docs returns ~200-500 tokens of targeted info. Loading the entire doc into CLAUDE.md would cost that every single turn.	High
Separate sessions for separate roles	Each session has its own clean context. No cross-contamination. No need to re-explain role A to session B.	Medium
Short, specific messages	Input is cheap, but also sets up a longer response. Vague prompts produce longer, less useful responses, costing more output tokens.	Low-Medium
Read specific file sections, not whole files	Reading lines 200-250 of a file costs ~500 tokens. Reading the whole file might cost 8,000.	High (for large files)
Start fresh sessions for new topics	A new session has minimal conversation history. Continuing in an old session means Claude re-reads everything from 3 hours ago every turn.	High (late sessions)

The Number That Actually Matters

People obsess over context window size. The number that actually determines quality is:

Available thinking space

=

200,000 (total window)

−

System prompt + CLAUDE.md + supplements + conversation history + tool results

=

~176K tokens

at the start of a fresh session

Everything you do to reduce the non-thinking parts is directly invested into Claude's ability to reason. Lean CLAUDE.md, targeted retrieval, short sessions — these aren't style preferences. They're how you buy yourself more thinking space per dollar.

Phase 1, Session 3

The Big Idea

See Tokens in Real Time

The Math: What 200K Tokens Actually Gets You

Why Code Costs More Than It Looks

The Hidden Token Costs

Visualize Your Session Budget

Tokens Used

% Window Full

Free Thinking Space

Compaction Risk

Tokens = Money (What It Actually Costs)

Output tokens cost more than input — here's why

Token Conservation in Practice

The Number That Actually Matters

Next: Phase 1, Session 4