Session 8: CI/CD & Measurement

In Session 7 you set up shared configuration and adoption strategy for your team. This session covers the final layer: running Claude Code non-interactively in scripts and CI pipelines, establishing metrics to measure impact, and building habits that sustain these practices after the training ends.

Batch Mode and CI Integration

What it is: Beyond interactive use, Claude Code runs non-interactively with claude -p (print mode), which takes a single prompt, produces output to stdout, and exits. This makes it scriptable — you can pipe it into files, chain it with other commands, or embed it in CI/CD pipelines. Common patterns include automated code review on pull requests (a GitHub Action that runs claude -p against the PR diff and posts findings as a comment), pre-commit hooks that check staged changes for security issues, and batch scripts that generate test coverage reports across multiple files. In CI environments, use --output-format json for machine-readable results, --allowedTools to restrict operations to safe read-only actions, --max-turns (or CLAUDE_CODE_MAX_TURNS) to prevent runaway sessions, and --model to choose a cost-appropriate model for each pipeline stage. Use /compact or session management techniques when scripting multi-step workflows. The key principle: Claude should never block your pipeline. If the API is down or the review times out, the build should proceed — gate on your real test suite, not on AI review.

Demo prompt:

claude -p "Read the files changed in this branch compared to main.
For each file, check for:
1. Security issues (SQL injection, hardcoded secrets, XSS)
2. Missing error handling
3. Obvious bugs

Report only CRITICAL findings. If nothing critical, respond
with just 'LGTM'." > review-output.md

Try it now: Run a batch mode review on your own recent work. First, check what files you changed: git diff --name-only main...HEAD. Then run claude -p "Review [file] for security issues and bugs. Be concise -- list only actionable findings." and redirect the output to a file. Compare what Claude finds to what you would catch in a manual review. If you want to go further, write a small shell script that loops through all changed files and produces a combined review report.

Go deeper: Hooks — configure PreToolUse and PostToolUse hooks that run deterministically on Claude Code events, complementing CI-level automation with local checks.

Long-Running Automation

What it is: Beyond one-shot batch commands, Claude Code can run tasks on recurring intervals. The /loop command runs a prompt repeatedly on a local schedule (default: every 10 minutes, auto-expires after 7 days) — useful for shepherding PRs through review, monitoring build status, or sweeping for issues on a timer. Tasks are session-scoped and stop when you exit. For durable scheduling that survives restarts, cloud scheduled tasks run on Anthropic’s infrastructure and can be configured via /schedule in the CLI or at claude.ai/code. Desktop scheduled tasks offer a middle ground — persistent scheduling with access to local files.

Demo prompt (local loop):

/loop 10m Check the CI status of my open PRs with `gh pr checks`.
If any checks have failed, investigate the failure logs and suggest fixes.

Demo prompt (scheduled cloud agent):

/schedule a daily job that reviews all PRs merged yesterday,
checks if any introduced changes that should be reflected in
our documentation, and opens an issue for each doc gap found.

Try it now: Think about a recurring task you do manually — checking CI, monitoring error rates, cleaning up stale branches. Try running /loop 5m [your task] and let Claude handle one cycle while you watch. Evaluate whether the output is useful enough to leave running unattended.

Go deeper: Hooks — hooks complement loops by running deterministic checks on every Claude Code event, while loops handle time-based recurring work.

Productivity Metrics and Measurement

What it is: Without measurement, AI adoption is a gut feeling. With measurement, it is a business case. The most useful metrics combine objective data from your tools (git log timestamps, CI pipeline data, PR review cycles) with subjective data from your team (weekly confidence scores, time-on-boilerplate estimates, overall productivity ratings). Establish baselines before AI adoption by analyzing git history and surveying the team once. Then compare after 30 days. Early adopters have reported improvements ranging from 50-70% faster time to PR, 30-50% fewer review cycles, and 15-25% higher test coverage on new code — though your results will vary depending on codebase, team experience, and workflow fit. The point is to measure your own baseline and track changes, not to hit a specific number. The measurement system itself is lightweight — a git log analysis script and a four-question weekly survey are enough to demonstrate ROI to leadership.

Demo prompt:

Help me set up a productivity measurement system for my team.

1. Write a bash script that analyzes our git log for the past
   30 days: average time from first commit to PR creation,
   average PR size, and number of review cycles per PR.
2. Draft a 4-question weekly survey to capture subjective
   metrics (developer confidence, time on boilerplate,
   overall productivity, AI usefulness) on a 1-5 scale.
3. Create a simple comparison template where I can put
   "before" and "after" numbers side by side.

Try it now: Run a quick baseline measurement on your own work. Ask Claude: “Analyze my git log for the past 30 days. How many PRs did I create? What was the average time between first commit and PR creation? What was the average number of files changed per PR?” Save the output — this becomes your “before” snapshot. In 30 days, run the same analysis and compare. Even a rough comparison gives you data to share with your team and leadership.

Go deeper: Cheat Sheets — quick-reference prompts organized by development phase, useful for tracking which AI techniques your team actually uses day to day.

Sustaining AI-Assisted Development Practices

What it is: The hardest part of AI adoption is not the first month — it is month three, when the novelty wears off and old habits creep back. Sustaining AI-assisted development requires three things: evolving your CLAUDE.md as the codebase changes (treat it like living documentation, not a one-time setup), watching for signs of over-reliance (engineers accepting code without reading it, test suites that only cover happy paths, architecture decisions deferred to AI), and continuing to learn as the tools evolve. Build habits that reinforce good practices: review CLAUDE.md changes in PRs like code changes, hold monthly retros on what AI techniques are working, rotate the “AI champion” role so everyone stays engaged, and schedule periodic “unplugged” sessions where engineers work without AI to maintain their fundamental skills. The goal is not to maximize AI usage — it is to maximize the quality and velocity of your team’s output, using AI as one tool among many.

Demo prompt:

We've been using Claude Code for 6 weeks now. Help me run a
team retrospective:

1. What questions should I ask the team about their AI usage?
2. What warning signs of over-reliance should I look for in
   our recent PRs and commit history?
3. Draft an agenda for a 30-minute retro focused on what's
   working, what's not, and what we should change.
4. Suggest 3 experiments we could try next month to level up.

Try it now: Reflect on your own Claude Code usage over this training program. Ask Claude: “Based on our conversation right now, what types of tasks have I been asking you to help with? What tasks could I be using you for but am not? What’s one area where I should rely on you less?” Use the answers to set a personal development goal for the next 30 days. Then check out Anthropic’s continued learning resources to find courses that address your growth areas.

Go deeper: Continued Learning — Anthropic’s Skilljar courses (Claude 101, MCP, Claude Code Skills, AI Fluency), documentation, community resources, and recommended learning paths for ongoing development.

Key Takeaways

Batch mode (claude -p) makes Claude Code scriptable for CI/CD pipelines, pre-commit hooks, and automated review — but never let AI review block your build pipeline.
Measure AI impact with both objective metrics (git timestamps, PR cycles, test coverage) and subjective metrics (weekly developer surveys). Baselines before adoption make the comparison meaningful.
Early adopters report 50-70% faster time to PR, 30-50% fewer review cycles, and 15-25% higher test coverage — but measure your own baseline rather than targeting specific numbers.
Sustaining practices requires treating CLAUDE.md as living documentation, watching for over-reliance, and continuing to learn as the tools evolve.
The goal is not maximum AI usage — it is maximum team output quality and velocity, with AI as one powerful tool in your workflow.

What’s Next

You’ve completed the workshop. To keep building skills, revisit any Scenarios you haven’t tried, explore the Reference pages for techniques you want to deepen, and check Continued Learning for Anthropic’s official courses and community resources.