Appendix: Review-and-Verify File Skeletons

Review Command

commands/agent-review-change.md Auto-triggered structured review

---
description: Review a PR diff against the seven review dimensions and post a structured comment.
---

Workflow:
1. Resolve the diff range from `$ARGUMENTS` (or from the `BASE_SHA..HEAD_SHA`
   environment variables when invoked by CI).
2. Load these docs in order:
   - `$CLAUDE_HOME/docs/agents/reference/review-dimensions.md`
   - `$CLAUDE_HOME/docs/agents/policy/review-gates.md`
   - `$CLAUDE_HOME/docs/agents/playbooks/review-change.md`
3. Run each of the seven dimensions against the diff:
   - correctness, safety, docs, tests, compatibility, security, performance
4. Emit structured JSON output: one entry per finding with dimension,
   severity, file, line, and message.
5. Apply the gate rules from `policy/review-gates.md`. Any finding above
   the configured severity in a gating dimension escalates to a blocking
   check.
6. Emit a Markdown rendering for the PR comment. The rendering is
   idempotent: updating an existing comment in place, not appending.

This command is invoked automatically by CI on PR events. It can also
be called by hand for phase-completion, pre-deploy, or post-incident
review.

Pass any `$ARGUMENTS` through to the review flow.

Policy

docs/agents/policy/review-gates.md Hard stops that block merge

---
kind: policy
binding: true
scope: framework
phases: [verify, commit]
source_of_truth: $CLAUDE_HOME/docs/agents/policy/review-gates.md
---

# Review Gates

## Default gates

A PR cannot merge while any of these are open:

- **Tests gate.** Any required test suite is failing on the PR branch.
- **Docs gate.** Behavior changed but an evergreen doc referenced by the
  changed area was not updated in the same branch.
- **Security gate.** A security finding above the configured severity
  threshold is open and unresolved.
- **Compatibility gate.** A breaking change is introduced without a
  Decision Record documenting the migration path.

## Gate mechanics

- Gates are implemented as required status checks in the platform (GitHub
  Actions, GitLab CI, etc.). The review command emits a check result per
  gate.
- Failing a gate escalates the PR check to `failure`. All other findings
  post as advisory comments and do not block the merge button.
- Re-running the review on a new push re-evaluates every gate against
  the latest diff.

## Repo-local gates

- Repos may add gates by writing `docs/agents/policy/review-gates.md` at
  the repo level. Repo gates augment the framework defaults; they do not
  replace them.
- A repo gate must name:
  1. the dimension it belongs to
  2. the severity threshold
  3. the failure mode (blocking or advisory)

## Change management

Gates are sticky. Disabling a gate requires a Decision Record explaining
why the gate was loosened and what compensating control replaces it.

Reference

docs/agents/reference/review-dimensions.md The seven dimensions, expanded

---
kind: reference
binding: false
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/reference/review-dimensions.md
---

# Review Dimensions

Seven dimensions structure every review run. Each dimension answers a
specific question and produces findings mapped one-to-one to that
dimension. A finding that fits two dimensions is usually shallow; split
it into two findings instead.

## Correctness

- Does the change do what it claims in the PR description?
- Are invariants preserved? Are state transitions complete?
- Are error paths handled, not just absent?
- Are off-by-one, boundary, and empty-case conditions covered?

## Safety

- Can this code corrupt or destroy data?
- Are destructive operations (deletes, truncates, migrations) gated?
- Are concurrent callers safe?
- Are there foot-guns a future caller will step on?

## Docs

- Are evergreen docs (CLAUDE.md, README, policy, reference) updated
  when behavior they describe changed?
- Is the commit message sufficient? Does it explain the why?
- Does any durable rationale need a Decision Record?
- Do new public APIs have inline documentation?

## Tests

- Does the change ship with tests?
- Do the tests test behavior, not merely absence of panic?
- Are edge cases covered (boundary, empty, malformed, concurrent)?
- Are tests hermetic, or do they depend on external state?

## Compatibility

- Does the change break API consumers?
- Does it break database schemas or stored data formats?
- Does it break serialized formats or on-disk encodings?
- Does it break configuration contracts?
- If a break is intended, does a Decision Record explain the migration
  path?

## Security

- Injection (SQL, shell, template, NoSQL)
- Authn / authz bypass or privilege escalation
- Exposed secrets, credentials, or sensitive data in logs
- Unsafe deserialization
- Permissioning on file, network, and service boundaries
- Supply-chain signals (new unpinned dependencies)

## Performance

- New hot-path regressions (latency, CPU, memory)
- N+1 queries, unbounded result sets
- Unbounded allocations, unbounded retries, unbounded concurrency
- Blocking I/O on hot paths
- Cache invalidation correctness

## Severity

Each finding carries a severity in {info, low, medium, high, critical}.
Gates are keyed on severity thresholds; the gate policy at
`policy/review-gates.md` defines the cutoff per dimension.

Playbooks

docs/agents/playbooks/review-change.md Run review dimensions against a diff

---
kind: playbook
binding: true
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/review-change.md
---

# Review Change Playbook

## Inputs

- A diff range (base..head) or a PR number.
- The review-dimensions reference.
- The review-gates policy.

## Steps

1. Fetch the diff. For CI invocations, use `git diff BASE_SHA..HEAD_SHA`.
2. For each file in the diff, for each review dimension, produce zero or
   more findings. Each finding carries: dimension, severity, file, line,
   message.
3. Apply gate rules to the findings. Any finding that triggers a gate
   escalates to a blocking check result.
4. Emit the structured JSON output (see reference/review-output-schema.md
   if present).
5. Render the PR comment Markdown. Update in place if a prior review
   comment exists.
6. Emit per-gate check results for the CI platform.

## Rules

- One finding per dimension per location. Do not stack multiple findings
  under the same dimension on the same line.
- Do not alter the diff. Review only.
- Prefer specific, actionable findings over general observations.
- Cite the exact file and line. A finding without coordinates is noise.

docs/agents/playbooks/verify.md Prove the change works

---
kind: playbook
binding: true
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/verify.md
---

# Verify Playbook

## Four layers

1. **Unit tests.** Isolated logic on documented inputs.
2. **Integration tests.** Components wired together, real dependencies
   where feasible.
3. **Staging / canary.** Production-like traffic on production-like data,
   without affecting all users.
4. **Observability check.** Metrics, logs, and traces confirm the change
   works in the running system.

## Steps

1. Confirm unit and integration tests pass (CI handles this).
2. If the change carries any deployment risk, exercise it in staging or
   a canary tier before full rollout.
3. After rollout, inspect metrics, logs, and traces for the affected
   code paths. Compare against a pre-change baseline.
4. If the change is user-facing, exercise the feature manually: at
   375 / 768 / 1200 px for UI, with realistic data, along a realistic
   user path.
5. Record the verify outcome in the PR (or linked incident doc if this
   is post-deploy).

## Rules

- "Tests pass" is necessary, not sufficient.
- Verify is not complete until the change has been observed running
  under real conditions without regression.
- If staging or observability coverage is missing for the area, flag
  that as a gap. Do not skip verify because the tooling is missing;
  the framework treats missing verify tooling as a governance finding.

docs/agents/playbooks/deploy.md Rollout, smoke, rollback

---
kind: playbook
binding: true
scope: framework
phases: [verify, commit]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/deploy.md
---

# Deploy Playbook

## Pre-deploy

1. Confirm the review's gates are all green.
2. Confirm verify has reached the observability layer (or is deliberately
   waived with a Decision Record).
3. Confirm the rollback path is documented. If the rollback is not a
   simple revert, the PR description or a linked runbook explains the
   actual procedure.

## Rollout

1. Deploy to staging or canary first, per the repo's rollout strategy.
2. Watch for the rollout duration specified by the repo's deploy config.
3. Proceed to full rollout only after canary has stabilized.

## Post-deploy smoke

1. Execute the feature-specific smoke step. A health check alone is not
   sufficient; the smoke must exercise the specific feature that changed.
2. Inspect metrics and logs for the affected code paths.
3. Record the smoke outcome in the PR or deploy tracker.

## Rollback

1. If smoke fails or a regression surfaces, roll back immediately.
2. Use the documented rollback procedure. Do not improvise.
3. Open a Decision Record capturing what went wrong if the lesson is
   durable.

## Incident triage

Treat incidents as a review that ran too late. Run `/agent-review-change`
against the change that caused the incident, capture the findings, and
feed them back into the review-dimensions and review-gates docs.

CI Workflow (Sanitized)

.github/workflows/agent-review.yml Auto-trigger on PR events

name: agent-review

on:
  pull_request:
    types: [opened, reopened, synchronize, ready_for_review]

permissions:
  contents: read
  pull-requests: write
  checks: write

jobs:
  review:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run agent review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.AGENT_REVIEW_KEY }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          BASE_SHA: ${{ github.event.pull_request.base.sha }}
          HEAD_SHA: ${{ github.event.pull_request.head.sha }}
        run: |
          agent-review-change \
            --diff-range "$BASE_SHA..$HEAD_SHA" \
            --pr "$PR_NUMBER" \
            --dimensions correctness,safety,docs,tests,compatibility,security,performance \
            --output structured \
            --output-path agent-review-output.json

      - name: Post or update review comment
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const review = JSON.parse(fs.readFileSync('agent-review-output.json', 'utf8'));
            const marker = '<!-- agent-review:comment -->';
            const body = marker + '\n' + review.markdown;

            const { data: comments } = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
            const existing = comments.find(c => c.body && c.body.includes(marker));
            if (existing) {
              await github.rest.issues.updateComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                comment_id: existing.id,
                body,
              });
            } else {
              await github.rest.issues.createComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: context.issue.number,
                body,
              });
            }

      - name: Emit gate check results
        if: always()
        run: |
          agent-review-change emit-checks \
            --input agent-review-output.json \
            --sha "$GITHUB_SHA"

The workflow is sanitized per the public-repo security rules: no account IDs, no bucket names, no real secret names beyond a placeholder. A production deployment substitutes the organization's own secrets, runner, and platform adapters (GitHub Checks vs. GitLab CI vs. Buildkite, etc.).

← Back to Part 3: Review and verify

Series: Part 1 · Routing · Part 2 · Change rhythm