Test-driven development with Claude Code follows the same Red-Green-Refactor cycle as traditional TDD, but with AI writing both the tests and the implementations at your direction. You stay in the driver’s seat — deciding what to test, reviewing each test for correctness, and controlling the pace.
The TDD Cycle with Claude Code
1. RED — Ask Claude to write a failing test for a specific behavior
2. GREEN — Ask Claude to write the minimum code to make it pass
3. REFACTOR — After 3-5 green tests, ask Claude to clean up
4. REPEAT — Next behavior
The key difference from solo TDD: Claude can write both the test and the implementation, so your job shifts from typing to reviewing and directing. You must verify each test captures your intent, not just Claude’s interpretation.
Step-by-Step Process
Step 1: Describe What You Want (Not How)
Start by telling Claude the behavior you need. Be specific about requirements, not implementation.
Write a failing JUnit 5 test for a UserRegistrationService.register() method.
The test should verify that registering with an email missing the @ symbol
returns a failure result. Use AssertJ assertions and Mockito for the
UserRepository dependency.
What Claude does: Reads your existing files to understand the project structure, then generates a test that:
- Uses
@ExtendWith(MockitoExtension.class)and@Mock - Follows the
methodName_givenCondition_expectedResultnaming convention - Uses
assertThat(...).isFalse()(AssertJ style)
What to check before accepting:
- Does the test name clearly describe the scenario?
- Does it test ONE specific behavior?
- Is the assertion checking the right thing?
Step 2: Verify It Fails (Red)
Run the test to confirm it fails.
Or run it yourself:
./gradlew test --tests "*UserRegistrationServiceTest.register_givenEmailWithoutAtSymbol*"
Expected: The test fails because register() is not implemented yet. This is the “red” phase.
Why this matters: If the test passes without implementation, the test is wrong. A test that cannot fail is useless.
Step 3: Write Minimal Implementation (Green)
Now write the minimum code in UserRegistrationService.register() to make
this test pass. Don't implement anything beyond what this one test requires.
What to watch for: Claude sometimes over-implements. If it adds validation for requirements you have not tested yet, push back:
That implementation handles password validation too, but we don't have a test
for that yet. Remove everything except the email @ check.
Step 4: Verify It Passes
Run the tests.
Expected: The new test passes. Any existing tests still pass. This is the “green” phase.
Step 5: Write the Next Test
Write a failing test for registering with a password shorter than 8 characters.
The test should verify the result contains an error message about password length.
Repeat the cycle: write test, verify it fails, implement, verify it passes.
Step 6: Refactor (After Several Cycles)
Once you have 4-5 tests passing, the implementation may have grown messy. Now refactor:
The register method is getting long. Extract the email validation and password
validation into private helper methods. Keep all tests passing.
After Claude refactors, verify:
- No behavior changed (all tests still pass)
- The code is cleaner and more readable
- Each method has a single responsibility
The Complete Cycle Visualized
Iteration 1:
Test: "email without @ returns failure"
Impl: Add @ check
Tests: 1 passing
Iteration 2:
Test: "password under 8 chars returns failure"
Impl: Add length check
Tests: 2 passing
Iteration 3:
Test: "taken username returns failure"
Impl: Add repository check
Tests: 3 passing
Iteration 4:
Test: "null username throws IllegalArgumentException"
Impl: Add null guard
Tests: 4 passing
Refactor:
Extract validateEmail(), validatePassword() helper methods
Tests: still 4 passing
Example Prompts for Each Phase
Writing Failing Tests
| What You Need | Prompt |
|---|---|
| First failing test | ”Write a failing test for [specific behavior]“ |
| Edge case tests | ”Write tests for null inputs, empty strings, and boundary values” |
| Error handling test | ”Write a test that verifies [method] throws [exception] when [condition]“ |
| Integration test | ”Write an integration test that hits [endpoint] with [payload] and verifies [expected response]“ |
Minimal Implementation
| What You Need | Prompt |
|---|---|
| Pass one test | ”Write the minimum code to make this test pass” |
| Push back on over-engineering | ”Remove everything except what’s needed for the current tests” |
| Follow patterns | ”Add the minimum code to pass. Follow the same pattern as the existing validatePassword method.” |
Refactoring
| What You Need | Prompt |
|---|---|
| Extract method | ”Extract [X] into a helper method. Keep all tests green.” |
| Remove duplication | ”The setup in these tests is duplicated. Extract a common setup.” |
| Check coverage | ”What behaviors in [file] don’t have tests yet?” |
Real-World Example: Adding a Feature with TDD
Here is how an engineer might use Claude Code to add a “password must contain a special character” requirement to an existing service:
Read UserRegistrationService.java and UserRegistrationServiceTest.java.
I need to add a requirement: passwords must contain at least one special
character (!@#$%^&*). Write a failing test for this.
Claude reads both files, sees the existing test patterns, and generates a consistent test.
Run the test to verify it fails.
Fails as expected — the current implementation does not check for special characters.
Add the minimum code to make this test pass. Follow the same pattern as
the existing validatePassword method.
Claude adds a regex check. All tests pass. Feature complete in under 2 minutes.
Writing Tests for Existing Code
TDD is for new code. But what about code that already exists without tests? Here is the workflow for retroactively adding test coverage.
Step 1: Analyze What Needs Testing
Read src/services/OrderService.java. List every public method and
for each one tell me:
- What it does (one sentence)
- What the happy path is
- What could go wrong (edge cases, error conditions)
- Whether it has any existing test coverage
Step 2: Generate Comprehensive Tests
Write JUnit 5 tests for OrderService. For each public method, include:
- Happy path test
- Null/empty input test
- Boundary condition test (zero, one, max)
- Error handling test (what happens when the repository throws?)
Use our existing test patterns from OrderServiceTest.java.
Name tests: should_[expected]_when_[condition]
Step 3: Find Coverage Gaps
Compare OrderService.java with OrderServiceTest.java.
What code paths are NOT exercised by any test?
List them by risk level (high/medium/low) and suggest a test
for each gap.
Why this works: Instead of asking for “more tests,” you get a systematic analysis of what is actually missing and what matters most.
When TDD with AI Works Best
| Scenario | Why TDD + AI Excels |
|---|---|
| New service or module | Clean slate, easy to define behaviors incrementally |
| Adding validation rules | Each rule is a discrete, testable behavior |
| Business logic with clear rules | Requirements map directly to test cases |
| Bug fix (reproduce then fix) | Write a failing test for the bug, then fix it |
| Refactoring existing code | Characterization tests lock down behavior before changes |
When to Skip TDD
| Scenario | Why |
|---|---|
| Prototyping / spike work | You’re exploring, not building for production. Write tests after. |
| UI layout and styling | Visual output is hard to test meaningfully with unit tests |
| Glue code / wiring | Simple plumbing that’s covered by integration tests |
| One-off scripts | Throwaway code that won’t be maintained |
Testing Anti-Patterns to Avoid
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Testing implementation details | Tests break when you refactor | Test behavior and outputs, not internal state |
| One giant test method | Hard to diagnose failures | One assertion per test, descriptive names |
| Copy-pasting test setup | Duplication makes maintenance hard | Use @BeforeEach, factories, or builders |
| Testing only happy paths | Bugs hide in edge cases | Always include null, empty, boundary, and error tests |
| Mocking everything | Tests pass but production breaks | Use real dependencies where practical |
| Flaky async tests | Inconsistent CI results | Use proper wait mechanisms, not Thread.sleep |
Common Mistakes with TDD and AI
1. Skipping the “red” step
Always verify the test fails first. If you skip this, you might write a test that passes for the wrong reason.
2. Letting Claude implement too much
TDD works because each test drives exactly one piece of logic. If Claude adds code for untested scenarios, tell it to remove it.
3. Not reviewing the test
Claude writes reasonable tests, but you need to verify the test captures YOUR intent. A test for the wrong behavior is worse than no test.
4. Forgetting to refactor
After 3-5 green tests, look for duplication. This is when TDD pays off — you refactor with confidence because tests catch regressions.
5. Writing tests after the code
Tests written after implementation tend to test “what the code does” rather than “what the code should do.” They are tautological. Describe the test cases first, then implement.
Setting Up for TDD Workflows
Add test runner commands to .claude/settings.json so Claude can run tests automatically:
{
"permissions": {
"allow": [
"Bash(./gradlew test*)",
"Bash(npm test*)",
"Bash(bundle exec rspec*)"
]
}
}
This eliminates the confirmation prompt on every test run, keeping the Red-Green-Refactor cycle fast.
Quick Reference
| Task | Prompt |
|---|---|
| TDD new feature | ”Write a failing test for [behavior]. Then write the minimal code to pass.” |
| Test existing code | ”Read [file]. Write comprehensive tests covering happy paths, edge cases, and errors.” |
| Reproduce a bug | ”This bug occurs when [condition]. Write a test that reproduces it, then fix the code.” |
| Find coverage gaps | ”Compare [source] with [test file]. What paths are not tested? Prioritize by risk.” |
| Refactor safely | ”Refactor [method] using [pattern]. Run tests after each change to verify nothing broke.” |
| Integration test | ”Write an integration test that hits [endpoint] with [payload] and verifies [expected response].” |