Skip to content

TDD with AI

Test-driven development workflows enhanced by AI assistance -- the Red-Green-Refactor cycle with Claude Code.

Test-driven development with Claude Code follows the same Red-Green-Refactor cycle as traditional TDD, but with AI writing both the tests and the implementations at your direction. You stay in the driver’s seat — deciding what to test, reviewing each test for correctness, and controlling the pace.


The TDD Cycle with Claude Code

1. RED    — Ask Claude to write a failing test for a specific behavior
2. GREEN  — Ask Claude to write the minimum code to make it pass
3. REFACTOR — After 3-5 green tests, ask Claude to clean up
4. REPEAT — Next behavior

The key difference from solo TDD: Claude can write both the test and the implementation, so your job shifts from typing to reviewing and directing. You must verify each test captures your intent, not just Claude’s interpretation.


Step-by-Step Process

Step 1: Describe What You Want (Not How)

Start by telling Claude the behavior you need. Be specific about requirements, not implementation.

Write a failing JUnit 5 test for a UserRegistrationService.register() method.
The test should verify that registering with an email missing the @ symbol
returns a failure result. Use AssertJ assertions and Mockito for the
UserRepository dependency.

What Claude does: Reads your existing files to understand the project structure, then generates a test that:

  • Uses @ExtendWith(MockitoExtension.class) and @Mock
  • Follows the methodName_givenCondition_expectedResult naming convention
  • Uses assertThat(...).isFalse() (AssertJ style)

What to check before accepting:

  • Does the test name clearly describe the scenario?
  • Does it test ONE specific behavior?
  • Is the assertion checking the right thing?

Step 2: Verify It Fails (Red)

Run the test to confirm it fails.

Or run it yourself:

./gradlew test --tests "*UserRegistrationServiceTest.register_givenEmailWithoutAtSymbol*"

Expected: The test fails because register() is not implemented yet. This is the “red” phase.

Why this matters: If the test passes without implementation, the test is wrong. A test that cannot fail is useless.

Step 3: Write Minimal Implementation (Green)

Now write the minimum code in UserRegistrationService.register() to make
this test pass. Don't implement anything beyond what this one test requires.

What to watch for: Claude sometimes over-implements. If it adds validation for requirements you have not tested yet, push back:

That implementation handles password validation too, but we don't have a test
for that yet. Remove everything except the email @ check.

Step 4: Verify It Passes

Run the tests.

Expected: The new test passes. Any existing tests still pass. This is the “green” phase.

Step 5: Write the Next Test

Write a failing test for registering with a password shorter than 8 characters.
The test should verify the result contains an error message about password length.

Repeat the cycle: write test, verify it fails, implement, verify it passes.

Step 6: Refactor (After Several Cycles)

Once you have 4-5 tests passing, the implementation may have grown messy. Now refactor:

The register method is getting long. Extract the email validation and password
validation into private helper methods. Keep all tests passing.

After Claude refactors, verify:

  • No behavior changed (all tests still pass)
  • The code is cleaner and more readable
  • Each method has a single responsibility

The Complete Cycle Visualized

Iteration 1:
  Test: "email without @ returns failure"
  Impl: Add @ check
  Tests: 1 passing

Iteration 2:
  Test: "password under 8 chars returns failure"
  Impl: Add length check
  Tests: 2 passing

Iteration 3:
  Test: "taken username returns failure"
  Impl: Add repository check
  Tests: 3 passing

Iteration 4:
  Test: "null username throws IllegalArgumentException"
  Impl: Add null guard
  Tests: 4 passing

Refactor:
  Extract validateEmail(), validatePassword() helper methods
  Tests: still 4 passing

Example Prompts for Each Phase

Writing Failing Tests

What You NeedPrompt
First failing test”Write a failing test for [specific behavior]“
Edge case tests”Write tests for null inputs, empty strings, and boundary values”
Error handling test”Write a test that verifies [method] throws [exception] when [condition]“
Integration test”Write an integration test that hits [endpoint] with [payload] and verifies [expected response]“

Minimal Implementation

What You NeedPrompt
Pass one test”Write the minimum code to make this test pass”
Push back on over-engineering”Remove everything except what’s needed for the current tests”
Follow patterns”Add the minimum code to pass. Follow the same pattern as the existing validatePassword method.”

Refactoring

What You NeedPrompt
Extract method”Extract [X] into a helper method. Keep all tests green.”
Remove duplication”The setup in these tests is duplicated. Extract a common setup.”
Check coverage”What behaviors in [file] don’t have tests yet?”

Real-World Example: Adding a Feature with TDD

Here is how an engineer might use Claude Code to add a “password must contain a special character” requirement to an existing service:

Read UserRegistrationService.java and UserRegistrationServiceTest.java.
I need to add a requirement: passwords must contain at least one special
character (!@#$%^&*). Write a failing test for this.

Claude reads both files, sees the existing test patterns, and generates a consistent test.

Run the test to verify it fails.

Fails as expected — the current implementation does not check for special characters.

Add the minimum code to make this test pass. Follow the same pattern as
the existing validatePassword method.

Claude adds a regex check. All tests pass. Feature complete in under 2 minutes.


Writing Tests for Existing Code

TDD is for new code. But what about code that already exists without tests? Here is the workflow for retroactively adding test coverage.

Step 1: Analyze What Needs Testing

Read src/services/OrderService.java. List every public method and
for each one tell me:
- What it does (one sentence)
- What the happy path is
- What could go wrong (edge cases, error conditions)
- Whether it has any existing test coverage

Step 2: Generate Comprehensive Tests

Write JUnit 5 tests for OrderService. For each public method, include:
- Happy path test
- Null/empty input test
- Boundary condition test (zero, one, max)
- Error handling test (what happens when the repository throws?)

Use our existing test patterns from OrderServiceTest.java.
Name tests: should_[expected]_when_[condition]

Step 3: Find Coverage Gaps

Compare OrderService.java with OrderServiceTest.java.
What code paths are NOT exercised by any test?
List them by risk level (high/medium/low) and suggest a test
for each gap.

Why this works: Instead of asking for “more tests,” you get a systematic analysis of what is actually missing and what matters most.


When TDD with AI Works Best

ScenarioWhy TDD + AI Excels
New service or moduleClean slate, easy to define behaviors incrementally
Adding validation rulesEach rule is a discrete, testable behavior
Business logic with clear rulesRequirements map directly to test cases
Bug fix (reproduce then fix)Write a failing test for the bug, then fix it
Refactoring existing codeCharacterization tests lock down behavior before changes

When to Skip TDD

ScenarioWhy
Prototyping / spike workYou’re exploring, not building for production. Write tests after.
UI layout and stylingVisual output is hard to test meaningfully with unit tests
Glue code / wiringSimple plumbing that’s covered by integration tests
One-off scriptsThrowaway code that won’t be maintained

Testing Anti-Patterns to Avoid

Anti-PatternProblemFix
Testing implementation detailsTests break when you refactorTest behavior and outputs, not internal state
One giant test methodHard to diagnose failuresOne assertion per test, descriptive names
Copy-pasting test setupDuplication makes maintenance hardUse @BeforeEach, factories, or builders
Testing only happy pathsBugs hide in edge casesAlways include null, empty, boundary, and error tests
Mocking everythingTests pass but production breaksUse real dependencies where practical
Flaky async testsInconsistent CI resultsUse proper wait mechanisms, not Thread.sleep

Common Mistakes with TDD and AI

1. Skipping the “red” step

Always verify the test fails first. If you skip this, you might write a test that passes for the wrong reason.

2. Letting Claude implement too much

TDD works because each test drives exactly one piece of logic. If Claude adds code for untested scenarios, tell it to remove it.

3. Not reviewing the test

Claude writes reasonable tests, but you need to verify the test captures YOUR intent. A test for the wrong behavior is worse than no test.

4. Forgetting to refactor

After 3-5 green tests, look for duplication. This is when TDD pays off — you refactor with confidence because tests catch regressions.

5. Writing tests after the code

Tests written after implementation tend to test “what the code does” rather than “what the code should do.” They are tautological. Describe the test cases first, then implement.


Setting Up for TDD Workflows

Add test runner commands to .claude/settings.json so Claude can run tests automatically:

{
  "permissions": {
    "allow": [
      "Bash(./gradlew test*)",
      "Bash(npm test*)",
      "Bash(bundle exec rspec*)"
    ]
  }
}

This eliminates the confirmation prompt on every test run, keeping the Red-Green-Refactor cycle fast.


Quick Reference

TaskPrompt
TDD new feature”Write a failing test for [behavior]. Then write the minimal code to pass.”
Test existing code”Read [file]. Write comprehensive tests covering happy paths, edge cases, and errors.”
Reproduce a bug”This bug occurs when [condition]. Write a test that reproduces it, then fix the code.”
Find coverage gaps”Compare [source] with [test file]. What paths are not tested? Prioritize by risk.”
Refactor safely”Refactor [method] using [pattern]. Run tests after each change to verify nothing broke.”
Integration test”Write an integration test that hits [endpoint] with [payload] and verifies [expected response].”