Save Tokens with Claude Code Subagents: A Practical Guide
Learn how to use Claude Code's subagent system to dramatically reduce token usage on complex tasks by delegating research and exploration to lightweight agents.
Tools mentioned in this article
š” Every file Claude reads eats your context window. Ask it to "investigate" something and it reads 10+ files ā burning 15,000+ tokens on content you'll never reference again. This CLAUDE.md snippet teaches Claude to automatically spawn subagents for exploration, keeping your main conversation clean. One paste. Every session. Zero effort.
Setup Steps
- 1Open your project's
CLAUDE.md(or create one) - 2Paste the Context Management section (shown below)
- 3Start a new session to pick up the changes
- 4Ask Claude to investigate something ā watch it spawn agents
The CLAUDE.md Snippet ā Copy This
markdown## Context Management
Context is your most important resource.
Proactively use subagents (Agent tool) to keep exploration,
research, and verbose operations out of the main conversation.
**Default to spawning agents for:**
- Codebase exploration (reading 3+ files to answer a question)
- Research tasks (web searches, doc lookups, investigating how something works)
- Code review or analysis (produces verbose output)
- Any investigation where only the summary matters
**Stay in main context for:**
- Direct file edits the user requested
- Short, targeted reads (1-2 files)
- Conversations requiring back-and-forth
- Tasks where user needs intermediate steps
**Rule of thumb:** If a task will read more than ~3 files
or produce output the user doesn't need to see verbatim,
delegate it to a subagent and return a summary.
The Decision Rule
| Spawn Agent | Stay in Main | |---|---| | 3+ files to read | Direct file edits | | Web searches / doc lookups | 1-2 file reads | | Code review | Back-and-forth iteration | | Any investigation | User needs to see steps |
How It Works
- 1You add the snippet to
CLAUDE.md - 2You ask a question (e.g. "how does auth work?")
- 3Claude spawns an Explore agent automatically
- 4The agent reads 8+ files in its own isolated context
- 5Only a clean summary returns to your conversation
What You See in the Terminal
Without ā Reads Everything
ā Read src/auth/login.ts
ā Read src/auth/session.ts
ā Read src/auth/oauth.ts
ā Read src/auth/refresh.ts
ā Read src/middleware/auth.ts
ā Read src/utils/tokens.ts
ā Read src/auth/providers.ts
ā Read src/config/auth.yaml
~15,000 tokens consumed Context: āāāāāāāāāāāāāā 78%
With ā Delegates to Agent
ā Agent "Investigate auth system"
ā Agent reads 8 files in isolation
ā Agent analyzes patterns
ā Task completed
ā Auth uses OAuth2 + JWT refresh.
ā Session middleware in src/...
~500 tokens returned Context: āāāāāāāāāāāāāā 12%
Types of Subagents
| Agent Type | Best For | Model | |---|---|---| | Explore | Finding files, understanding implementations, searching patterns | sonnet | | Plan | Designing architecture, planning multi-file refactors | sonnet | | General Purpose | Complex multi-step tasks that need file writes | sonnet | | Haiku | Simple lookups, listing files, quick searches | haiku (cheaper) |
Real Token Savings
Before: ~80,000 tokens
You: "Refactor the auth system"
Claude reads 15 files ā all in main context ~50,000 tokens
Claude reasons about changes ~10,000 tokens
Claude makes edits ~20,000 tokens
After: ~20,000 tokens
You: "Refactor the auth system"
Claude spawns Explore agent ā summary ~500 tokens
Claude reasons from summary ~5,000 tokens
Claude reads only files it needs to edit ~15,000 tokens
ā The subagent used ~50k tokens in isolation, but that context is discarded after returning the summary. Your main conversation stays lean.
Tips for Maximum Savings
Run Multiple Agents in Parallel
Agents that don't depend on each other run simultaneously:
Agent("Find all database migration files") āā
Agent("Find all API routes for payments") āā parallel
Agent("Check test coverage for payment code") āā
Be Specific in Prompts
| Bad | Good | |---|---| | "Look at the auth code" | "Find the JWT validation middleware, list its dependencies, and identify where tokens are refreshed" | | "Check the tests" | "List all failing tests in the payments module with their error messages" | | "How does the API work" | "Map all API route handlers: HTTP method, path, auth requirement, and response shape" |
Use Haiku for Simple Tasks
For straightforward searches, specify model: "haiku" ā faster and cheaper:
Agent(model: "haiku", prompt: "List all TypeScript files
in src/components/ that import useState")
Gotchas
ā ļø Subagents can't spawn other subagents (no nesting). Each subagent starts fresh ā no access to your conversation history. Results are summarized when returned (which is the point).
When NOT to Use Subagents
Subagents add latency (spawning takes time). Skip them when:
- You're reading a single known file
- The search is simple (use
Grep/Globdirectly) - You're in a quick edit cycle
- The task doesn't need research
Success Check
ā
Ask Claude to investigate anything in your codebase. You should see Agent tool calls instead of multiple Read calls ā that means it's delegating.