Session 3: Testing — AI Workshop

Sessions 1 and 2 gave you the fundamentals: writing specific prompts, iterating on responses, and using the Explore-Plan-Code-Commit workflow. This session applies those skills to testing — driving new code with TDD, generating tests for existing code, and systematically finding coverage gaps.

The underlying principle: giving Claude a way to verify its own work is the single most important thing you can do for output quality. Tests, linters, build checks, and browser testing all serve as verification loops — feedback mechanisms that let Claude iterate until the output actually works, not just looks right. TDD is the most disciplined form of this pattern, but the principle applies everywhere.

TDD with AI (Red-Green-Refactor)

What it is: Test-driven development becomes faster and more consistent when you use Claude as a pair partner in the red-green-refactor loop. You describe the behavior you want, Claude writes a failing test, you verify it fails (red), Claude writes the minimal implementation to pass (green), and after a few cycles you refactor together while keeping tests green. The key discipline is the same as traditional TDD — never let Claude skip the “verify it fails” step, and push back if it implements more than the current test requires. Claude sometimes over-engineers; telling it “only implement what this one test needs” keeps each cycle tight.

Demo prompt (Red — write the failing test):

Write a failing test for a UserRegistrationService.register() method.
The test should verify that registering with an email missing the @
symbol returns a failure result. Use our existing test conventions.

After you write the test, stop. Do not write the implementation yet.

Follow-up prompt (Green — minimal implementation):

The test fails as expected. Now write the minimum code in
UserRegistrationService.register() to make this test pass.
Don't implement anything beyond what this one test requires.

Try it now: Pick a small piece of functionality you need to add to your project. Describe the first behavior to Claude and ask it to write only the failing test. Run the test to confirm it fails. Then ask Claude for the minimal implementation. Run the test again. Repeat for a second behavior, then refactor.

Go deeper: TDD Workflow — the full red-green-refactor workflow with step-by-step examples and prompts for each phase.

Test Generation for Existing Code

What it is: Most codebases have untested or under-tested code. Instead of writing tests from scratch, you can point Claude at an existing source file and ask it to generate comprehensive tests covering happy paths, edge cases, and error conditions. This follows the same EPCC pattern from Session 2: Explore (read the source and existing tests), Plan (decide what categories to cover), Code (generate the tests), Commit (review and merge). The key is telling Claude what test framework and conventions to follow, and then reviewing the generated tests to confirm they test meaningful behavior — not implementation details. A good prompt asks Claude to read both the source file and any existing tests so it matches your naming patterns and assertion style.

Demo prompt:

Read src/services/OrderService.java and the existing tests in
test/services/OrderServiceTest.java. Write additional tests covering:
1. Happy path for each public method that lacks a test
2. Null and empty input handling
3. Boundary conditions (zero, one, maximum values)
4. Error handling (what happens when the repository throws?)

Match the naming and assertion patterns in the existing test file.

Try it now: Find a source file in your project that you know has incomplete test coverage. Ask Claude to read both the source and existing test file, then generate tests that fill the gaps. Run the new tests. Then validate them: temporarily break something in the source code and confirm at least one new test catches it. A test that cannot fail when the behavior it describes is broken is testing the wrong thing.

This same technique — generating tests for existing behavior — is called “characterization testing” when used before refactoring. You will use it in Sessions 4 and 5 to create safety nets before changing code.

Go deeper: Common Prompting Mistakes — mistakes that undermine AI-assisted workflows, including accepting generated code without review, over-implementation, and missing context.

Finding Coverage Gaps

What it is: Beyond adding tests for individual files, Claude can perform a systematic comparison between source code and test code to identify what is not tested. This is more targeted than a line-coverage report because Claude can reason about which untested paths carry the most risk — for example, error handling in payment processing matters more than a getter method. Ask Claude to read both files, list every untested code path, and rank them by risk so you know where to invest your testing effort first.

Demo prompt:

Compare src/services/PaymentService.java with
test/services/PaymentServiceTest.java.

List every code path in the source that is NOT exercised by any test.
For each gap, tell me:
1. What the untested path does
2. Risk level (high/medium/low) based on what could go wrong
3. A specific test case that would cover it

Sort by risk level, highest first.

Try it now: Pick a critical service in your project — something that handles money, authentication, or data integrity. Ask Claude to compare the source with its test file and produce a risk-ranked list of coverage gaps. Write a test for the highest-risk gap it identifies.

Go deeper: Before and After Examples — real code transformations showing the impact of AI-assisted development, from test generation to security fixes and refactoring.

Key Takeaways

TDD with AI follows the same red-green-refactor discipline as traditional TDD. Always verify the test fails before asking for the implementation.
Push back when Claude over-implements. “Only write what this test requires” keeps each cycle focused and prevents untested logic from creeping in.
When generating tests for existing code, have Claude read both the source and existing tests so it matches your conventions and avoids duplicating coverage.
Coverage gap analysis is most valuable when you ask Claude to rank gaps by risk, so you invest testing effort where it matters most.
Run the tests after every change. Tests are Claude’s self-correction mechanism — when a test fails, Claude can diagnose and fix the issue without your intervention. Without tests, you are the only feedback loop.
Verification loops are the #1 force multiplier for AI-assisted development. Tests are one form; browser testing, build checks, linters, and log analysis are others. The more verification Claude can do automatically, the better the output. See Verification Loops for the full pattern.
TDD scales beyond individual files. For full project planning with TDD at every step, see Structured Project Planning — a workflow that combines EPCC, plan mode, and TDD into a complete lifecycle.

Practice

Apply your testing skills with the Intermediate Scenarios — TDD exercises, test generation challenges, and coverage gap analysis using your own codebase.