Session 4: Code Review

In Session 3 you learned to write and generate tests — now you’ll use those same verification skills in reverse. This session covers using Claude as a code review partner for your own changes and others’ PRs, and how to evaluate AI-generated code so you know what to trust, what to verify, and what to rewrite.

Using Claude to Review Code

What it is: Claude can act as a first-pass reviewer that catches mechanical issues — missing error handling, security vulnerabilities, inconsistent patterns — so human reviewers can focus on design and intent. The key to a useful review is specificity: “review this code” produces generic feedback, while “check this payment handler for SQL injection and missing error handling” produces actionable findings. You can review your own staged changes before pushing, review a teammate’s PR diff, or ask Claude to scan a module for anti-patterns. Providing business context (“this handles credit card refunds, we have had duplicate refund bugs”) dramatically improves relevance.

Demo prompt:

Review the staged changes. Check for:
1. Security issues (injection, auth bypass, exposed secrets)
2. Missing error handling
3. Edge cases that aren't covered
4. Anything that would break existing behavior

Categorize each finding as CRITICAL, IMPORTANT, or MINOR.

Try it now: Stage a recent change in your project (or check out a branch with pending changes). Ask Claude to review the staged diff with a specific focus area — security, performance, or correctness. Compare the findings to issues you already know about. Note any false positives and any real issues you missed.

Go deeper: Code Review Workflow — review strategies, prompt techniques, and severity-based feedback workflows.

Evaluating AI-Generated Code

What it is: Not all AI output deserves the same level of scrutiny. Test boilerplate and formatting are high-trust (verify quickly), business logic and SQL queries are medium-trust (verify carefully), and security code and concurrency are low-trust (verify thoroughly). Learning to calibrate your review effort by output type saves time without sacrificing safety. Always run /diff after Claude edits files — it shows exactly what changed, including modifications you didn’t ask for. Watch for common hallucination patterns: invented API methods that do not exist, plausible-but-wrong logic (correct syntax, wrong semantics), outdated framework patterns, and confidently explained incorrect code. The best defense is running the code — the testing skills from Session 3 are your strongest review tool, because tests catch what eyes miss.

Demo prompt:

You just generated a PaymentProcessor class for me. Before I accept it,
walk me through:
1. What API methods and libraries does this code call? Do they exist
   in the versions we use?
2. What happens on each error path — does every failure get handled?
3. What edge cases could produce wrong results?
4. What would you change if this had to handle 10,000 requests/second?

Try it now: Take a piece of code Claude recently generated for you (or generate something now — ask it to write a service method). Before accepting it, ask Claude to critique its own output using the prompt above. Then independently verify: check that every method it calls actually exists in your dependencies, and write a test for the most likely edge case.

Go deeper: Before and After Examples — real code transformations showing prompts and results across testing, security, refactoring, and review.

Review Workflows and Prompts

What it is: Effective code review with AI requires structure beyond a single prompt. Three workflows cover most situations. Pre-push self-review: stage your changes, ask Claude to review the diff, fix issues before pushing — this saves a full review round-trip. PR review with focus areas: break large diffs into logical groups (model changes, then service layer, then API layer, then tests) and review each group separately so Claude’s context stays focused. Anti-pattern scanning: give Claude a checklist of specific code smells (god classes, swallowed exceptions, hardcoded config, dead code) and ask it to scan a directory, reporting file and line number for each finding. For UI changes, paste a screenshot directly into Claude (Cmd+V on macOS) and ask it to identify visual issues — misaligned elements, missing states, or accessibility problems:

[paste screenshot]
This is the checkout page after our latest changes. Check for:
1. Visual alignment issues
2. Missing error states
3. Accessibility problems (contrast, touch targets)

See Images & Screenshots for more on using Claude’s vision capabilities. For automated review at scale, the Code Review reference page covers Claude Code Review — a managed service that reviews PRs automatically using specialized agents, with customization via REVIEW.md.

Demo prompt:

This PR has many files changed. Let's review in layers:
1. First, review only the database migration and model changes.
   Are the schema changes correct and safe?
2. Then review the service layer changes. Does the business logic
   match the requirements?
3. Then review the API layer. Are error responses consistent?
4. Finally, review the test changes. Are they sufficient?

For each layer, tell me if it looks correct before moving on.

Try it now: Find a recent PR in your project (yours or a teammate’s). Ask Claude to review it using the layered approach above. If the PR is small, try the anti-pattern scan instead: give Claude a checklist of five code smells relevant to your codebase and ask it to scan the changed files.

Go deeper: Common Prompting Mistakes — how to avoid the “wall of feedback” problem and other review anti-patterns.

Key Takeaways

Specific review prompts produce actionable findings. “Check for SQL injection” beats “review this code” every time.
Provide business context so Claude focuses on what matters. “This handles refunds” changes the entire review.
Calibrate trust by output type: trust boilerplate quickly, verify business logic carefully, and always independently review security code.
Run /diff after every Claude edit. It shows exactly what changed, including modifications you didn’t ask for — this is your primary tool for catching scope creep in AI output.
Watch for hallucinated APIs, plausible-but-wrong logic, and outdated patterns — the three most common failure modes in AI-generated code.
Structure large reviews by breaking diffs into layers. Reviewing 50 files at once overwhelms both humans and AI; reviewing by layer keeps feedback focused and accurate.

Practice

Sharpen your review skills with the Intermediate Scenarios — code review exercises, refactoring challenges, and quality evaluation drills.