tutorial

8 min readMarch 5, 2026

Save Tokens with Claude Code Subagents: A Practical Guide

Learn how to use Claude Code's subagent system to dramatically reduce token usage on complex tasks by delegating research and exploration to lightweight agents.

Tools mentioned in this article

Claude

AI assistant for problem solvers — chat, code, and automate tasks

Free / $20/mo

Content Creation

💡 Every file Claude reads eats your context window. Ask it to "investigate" something and it reads 10+ files — burning 15,000+ tokens on content you'll never reference again. This CLAUDE.md snippet teaches Claude to automatically spawn subagents for exploration, keeping your main conversation clean. One paste. Every session. Zero effort.

Setup Steps

1
Open your project's CLAUDE.md (or create one)
2
Paste the Context Management section (shown below)
3
Start a new session to pick up the changes
4
Ask Claude to investigate something — watch it spawn agents

The CLAUDE.md Snippet — Copy This

markdown## Context Management

Context is your most important resource.
Proactively use subagents (Agent tool) to keep exploration,
research, and verbose operations out of the main conversation.

**Default to spawning agents for:**
- Codebase exploration (reading 3+ files to answer a question)
- Research tasks (web searches, doc lookups, investigating how something works)
- Code review or analysis (produces verbose output)
- Any investigation where only the summary matters

**Stay in main context for:**
- Direct file edits the user requested
- Short, targeted reads (1-2 files)
- Conversations requiring back-and-forth
- Tasks where user needs intermediate steps

**Rule of thumb:** If a task will read more than ~3 files
or produce output the user doesn't need to see verbatim,
delegate it to a subagent and return a summary.

The Decision Rule

Spawn Agent	Stay in Main
3+ files to read	Direct file edits
Web searches / doc lookups	1-2 file reads
Code review	Back-and-forth iteration
Any investigation	User needs to see steps

How It Works

1
You add the snippet to CLAUDE.md
2
You ask a question (e.g. "how does auth work?")
3
Claude spawns an Explore agent automatically
4
The agent reads 8+ files in its own isolated context
5
Only a clean summary returns to your conversation

What You See in the Terminal

Without — Reads Everything

→ Read src/auth/login.ts
→ Read src/auth/session.ts
→ Read src/auth/oauth.ts
→ Read src/auth/refresh.ts
→ Read src/middleware/auth.ts
→ Read src/utils/tokens.ts
→ Read src/auth/providers.ts
→ Read src/config/auth.yaml
~15,000 tokens consumed        Context: ████████████░░ 78%

With — Delegates to Agent

→ Agent "Investigate auth system"
  → Agent reads 8 files in isolation
  → Agent analyzes patterns
→ Task completed
→ Auth uses OAuth2 + JWT refresh.
→ Session middleware in src/...
~500 tokens returned            Context: █░░░░░░░░░░░░░ 12%

Types of Subagents

Agent Type	Best For	Model
Explore	Finding files, understanding implementations, searching patterns	sonnet
Plan	Designing architecture, planning multi-file refactors	sonnet
General Purpose	Complex multi-step tasks that need file writes	sonnet
Haiku	Simple lookups, listing files, quick searches	haiku (cheaper)

Real Token Savings

Before: ~80,000 tokens

You: "Refactor the auth system"
Claude reads 15 files → all in main context    ~50,000 tokens
Claude reasons about changes                   ~10,000 tokens
Claude makes edits                             ~20,000 tokens

After: ~20,000 tokens

You: "Refactor the auth system"
Claude spawns Explore agent                    → summary ~500 tokens
Claude reasons from summary                    ~5,000 tokens
Claude reads only files it needs to edit       ~15,000 tokens

✅ The subagent used ~50k tokens in isolation, but that context is discarded after returning the summary. Your main conversation stays lean.

Tips for Maximum Savings

Run Multiple Agents in Parallel

Agents that don't depend on each other run simultaneously:

Agent("Find all database migration files")        ─┐
Agent("Find all API routes for payments")           ├─ parallel
Agent("Check test coverage for payment code")      ─┘

Be Specific in Prompts

Bad	Good
"Look at the auth code"	"Find the JWT validation middleware, list its dependencies, and identify where tokens are refreshed"
"Check the tests"	"List all failing tests in the payments module with their error messages"
"How does the API work"	"Map all API route handlers: HTTP method, path, auth requirement, and response shape"

Use Haiku for Simple Tasks

For straightforward searches, specify model: "haiku" — faster and cheaper:

Agent(model: "haiku", prompt: "List all TypeScript files
  in src/components/ that import useState")

Gotchas

⚠️ Subagents can't spawn other subagents (no nesting). Each subagent starts fresh — no access to your conversation history. Results are summarized when returned (which is the point).

When NOT to Use Subagents

Subagents add latency (spawning takes time). Skip them when:

You're reading a single known file
The search is simple (use Grep/Glob directly)
You're in a quick edit cycle
The task doesn't need research

Success Check

✅ Ask Claude to investigate anything in your codebase. You should see Agent tool calls instead of multiple Read calls — that means it's delegating.

#claude-code #productivity #cli

Related Guides

resource