Skip to main content
tutorial
8 min readMarch 5, 2026

Save Tokens with Claude Code Subagents: A Practical Guide

Learn how to use Claude Code's subagent system to dramatically reduce token usage on complex tasks by delegating research and exploration to lightweight agents.

Tools mentioned in this article

💡 Every file Claude reads eats your context window. Ask it to "investigate" something and it reads 10+ files — burning 15,000+ tokens on content you'll never reference again. This CLAUDE.md snippet teaches Claude to automatically spawn subagents for exploration, keeping your main conversation clean. One paste. Every session. Zero effort.

Setup Steps

  1. 1
    Open your project's CLAUDE.md (or create one)
  2. 2
    Paste the Context Management section (shown below)
  3. 3
    Start a new session to pick up the changes
  4. 4
    Ask Claude to investigate something — watch it spawn agents

The CLAUDE.md Snippet — Copy This

markdown## Context Management

Context is your most important resource.
Proactively use subagents (Agent tool) to keep exploration,
research, and verbose operations out of the main conversation.

**Default to spawning agents for:**
- Codebase exploration (reading 3+ files to answer a question)
- Research tasks (web searches, doc lookups, investigating how something works)
- Code review or analysis (produces verbose output)
- Any investigation where only the summary matters

**Stay in main context for:**
- Direct file edits the user requested
- Short, targeted reads (1-2 files)
- Conversations requiring back-and-forth
- Tasks where user needs intermediate steps

**Rule of thumb:** If a task will read more than ~3 files
or produce output the user doesn't need to see verbatim,
delegate it to a subagent and return a summary.

The Decision Rule

Spawn AgentStay in Main
3+ files to readDirect file edits
Web searches / doc lookups1-2 file reads
Code reviewBack-and-forth iteration
Any investigationUser needs to see steps

How It Works

  1. 1
    You add the snippet to CLAUDE.md
  2. 2
    You ask a question (e.g. "how does auth work?")
  3. 3
    Claude spawns an Explore agent automatically
  4. 4
    The agent reads 8+ files in its own isolated context
  5. 5
    Only a clean summary returns to your conversation

What You See in the Terminal

Without — Reads Everything

→ Read src/auth/login.ts
→ Read src/auth/session.ts
→ Read src/auth/oauth.ts
→ Read src/auth/refresh.ts
→ Read src/middleware/auth.ts
→ Read src/utils/tokens.ts
→ Read src/auth/providers.ts
→ Read src/config/auth.yaml
~15,000 tokens consumed        Context: ████████████░░ 78%

With — Delegates to Agent

→ Agent "Investigate auth system"
  → Agent reads 8 files in isolation
  → Agent analyzes patterns
→ Task completed
→ Auth uses OAuth2 + JWT refresh.
→ Session middleware in src/...
~500 tokens returned            Context: █░░░░░░░░░░░░░ 12%

Types of Subagents

Agent TypeBest ForModel
ExploreFinding files, understanding implementations, searching patternssonnet
PlanDesigning architecture, planning multi-file refactorssonnet
General PurposeComplex multi-step tasks that need file writessonnet
HaikuSimple lookups, listing files, quick searcheshaiku (cheaper)

Real Token Savings

Before: ~80,000 tokens

You: "Refactor the auth system"
Claude reads 15 files → all in main context    ~50,000 tokens
Claude reasons about changes                   ~10,000 tokens
Claude makes edits                             ~20,000 tokens

After: ~20,000 tokens

You: "Refactor the auth system"
Claude spawns Explore agent                    → summary ~500 tokens
Claude reasons from summary                    ~5,000 tokens
Claude reads only files it needs to edit       ~15,000 tokens

✅ The subagent used ~50k tokens in isolation, but that context is discarded after returning the summary. Your main conversation stays lean.

Tips for Maximum Savings

Run Multiple Agents in Parallel

Agents that don't depend on each other run simultaneously:

Agent("Find all database migration files")        ─┐
Agent("Find all API routes for payments")           ├─ parallel
Agent("Check test coverage for payment code")      ─┘

Be Specific in Prompts

BadGood
"Look at the auth code""Find the JWT validation middleware, list its dependencies, and identify where tokens are refreshed"
"Check the tests""List all failing tests in the payments module with their error messages"
"How does the API work""Map all API route handlers: HTTP method, path, auth requirement, and response shape"

Use Haiku for Simple Tasks

For straightforward searches, specify model: "haiku" — faster and cheaper:

Agent(model: "haiku", prompt: "List all TypeScript files
  in src/components/ that import useState")

Gotchas

⚠️ Subagents can't spawn other subagents (no nesting). Each subagent starts fresh — no access to your conversation history. Results are summarized when returned (which is the point).

When NOT to Use Subagents

Subagents add latency (spawning takes time). Skip them when:

  • You're reading a single known file
  • The search is simple (use Grep/Glob directly)
  • You're in a quick edit cycle
  • The task doesn't need research

Success Check

✅ Ask Claude to investigate anything in your codebase. You should see Agent tool calls instead of multiple Read calls — that means it's delegating.

Related Guides