Model Selection & Cost

Claude Code supports multiple models with different strengths, speeds, and costs. Picking the right model for the task avoids overpaying for simple work and underperforming on complex work.

Available models

Model	Alias	Best for	Speed	Cost
Opus 4.6	`opus`	Complex architecture, subtle bugs, multi-file reasoning	Slower	Highest
Sonnet 4.6	`sonnet`	Daily coding, feature work, code review, most tasks	Fast	Moderate
Haiku 4.5	`haiku`	Quick questions, simple generation, boilerplate	Fastest	Lowest

Sonnet is the default and the right choice for most work. Use Opus when you need Claude to hold more complexity in its head simultaneously. Use Haiku for tasks where speed matters more than depth. You can also use full model IDs like claude-opus-4-6 or claude-sonnet-4-6 if you need a specific version.

Switching models

# During a session
/model sonnet
/model opus

# At launch
claude --model opus
claude --model sonnet

You can also press Option+P (macOS) or Alt+P (Linux/Windows) to switch models without clearing your prompt.

Effort levels

Effort controls how much reasoning Claude does before responding. Lower effort means faster, cheaper responses. Higher effort means deeper analysis.

/effort low       # Quick, surface-level responses
/effort medium    # Balanced (default for most modes)
/effort high      # Thorough analysis
/effort max       # Maximum reasoning depth (Opus only)
/effort auto      # Reset to the model's default effort level

When to change effort:

Task	Recommended effort
Quick lookups and explanations	low
Writing tests, feature implementation	medium
Security review, complex debugging	high
Architecture design, multi-file reasoning	high or max

Fast mode

/fast on     # Toggle fast mode on
/fast off    # Toggle fast mode off

Or press Option+O (macOS) / Alt+O (Linux/Windows) to toggle mid-session.

Fast mode is specific to Opus 4.6 — it makes Opus roughly 2.5x faster at a higher per-token cost. If you are on a different model when you enable fast mode, Claude Code switches you to Opus 4.6 automatically. Use it for iterative work where you are going back and forth rapidly — writing code, running tests, fixing errors. Turn it off when you want to save cost or need Claude to reason more carefully.

Tracking costs

Token usage in this session

/cost

Shows how many tokens you have used, the cost so far, and a breakdown by input vs. output tokens.

Plan limits and rate status

/usage

Shows your current plan’s limits and how much you have used. Useful for knowing if you are approaching rate limits.

Context window usage

/context

Visualizes how full your context window is. A full context window costs more per message because Claude processes the entire context with every response. See Sessions & Context for strategies to keep this manageable.

Budget guardrails

For automated or unattended runs, set hard limits:

# Stop after spending $5
claude -p "refactor the billing module" --max-budget-usd 5

# Stop after 10 agentic turns (prevents runaway loops)
claude -p "fix all test failures" --max-turns 10

Both flags only work in non-interactive (print/batch) mode (claude -p). They are not available in interactive sessions. They are especially important for CI/CD pipelines where a bug in the prompt could cause Claude to loop indefinitely.

Cost-saving patterns

Right-size the model

Do not use Opus for writing boilerplate. Do not use Haiku for architecture review. Match the model to the task.

# Quick: generate a test file (Sonnet is fine)
claude -p "write tests for src/utils.ts" --model sonnet

# Deep: find a subtle concurrency bug (worth Opus)
claude -p "find race conditions in src/services/" --model opus

Compact regularly

A conversation at 80% context costs roughly 4x per message compared to 20%. Use /compact to keep the window lean.

Clear between tasks

/clear is free and prevents paying to process irrelevant context from a previous task.

Use effort levels

/effort low for quick lookups saves significant tokens compared to the default. The response will be shorter and less thorough, but that is often exactly what you want.

Batch mode for CI

In CI/CD, always set --max-turns and --max-budget-usd to prevent surprise costs:

claude -p "review staged changes" \
  --max-turns 5 \
  --max-budget-usd 2 \
  --output-format json

Tips

Start most sessions with Sonnet. Upgrade to Opus only when you notice Claude struggling with complexity.
/cost is instant and non-disruptive — check it periodically during long sessions.
Effort levels have a bigger impact on cost than most people expect. /effort low for a quick question can be 5-10x cheaper than /effort high.
Fast mode makes Opus 4.6 faster at higher cost. Use it liberally during implementation loops when speed matters more than cost.
--max-budget-usd is your safety net for any automated run. Set it even if you think the task is cheap.
When demonstrating Claude Code to leadership, /cost at the end of a session gives concrete numbers for ROI discussions.