Review Command
commands/agent-review-change.md Auto-triggered structured review
---
description: Review a PR diff against the seven review dimensions and post a structured comment.
---
Workflow:
1. Resolve the diff range from `$ARGUMENTS` (or from the `BASE_SHA..HEAD_SHA`
environment variables when invoked by CI).
2. Load these docs in order:
- `$CLAUDE_HOME/docs/agents/reference/review-dimensions.md`
- `$CLAUDE_HOME/docs/agents/policy/review-gates.md`
- `$CLAUDE_HOME/docs/agents/playbooks/review-change.md`
3. Run each of the seven dimensions against the diff:
- correctness, safety, docs, tests, compatibility, security, performance
4. Emit structured JSON output: one entry per finding with dimension,
severity, file, line, and message.
5. Apply the gate rules from `policy/review-gates.md`. Any finding above
the configured severity in a gating dimension escalates to a blocking
check.
6. Emit a Markdown rendering for the PR comment. The rendering is
idempotent: updating an existing comment in place, not appending.
This command is invoked automatically by CI on PR events. It can also
be called by hand for phase-completion, pre-deploy, or post-incident
review.
Pass any `$ARGUMENTS` through to the review flow.
Policy
docs/agents/policy/review-gates.md Hard stops that block merge
---
kind: policy
binding: true
scope: framework
phases: [verify, commit]
source_of_truth: $CLAUDE_HOME/docs/agents/policy/review-gates.md
---
# Review Gates
## Default gates
A PR cannot merge while any of these are open:
- **Tests gate.** Any required test suite is failing on the PR branch.
- **Docs gate.** Behavior changed but an evergreen doc referenced by the
changed area was not updated in the same branch.
- **Security gate.** A security finding above the configured severity
threshold is open and unresolved.
- **Compatibility gate.** A breaking change is introduced without a
Decision Record documenting the migration path.
## Gate mechanics
- Gates are implemented as required status checks in the platform (GitHub
Actions, GitLab CI, etc.). The review command emits a check result per
gate.
- Failing a gate escalates the PR check to `failure`. All other findings
post as advisory comments and do not block the merge button.
- Re-running the review on a new push re-evaluates every gate against
the latest diff.
## Repo-local gates
- Repos may add gates by writing `docs/agents/policy/review-gates.md` at
the repo level. Repo gates augment the framework defaults; they do not
replace them.
- A repo gate must name:
1. the dimension it belongs to
2. the severity threshold
3. the failure mode (blocking or advisory)
## Change management
Gates are sticky. Disabling a gate requires a Decision Record explaining
why the gate was loosened and what compensating control replaces it.
Reference
docs/agents/reference/review-dimensions.md The seven dimensions, expanded
---
kind: reference
binding: false
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/reference/review-dimensions.md
---
# Review Dimensions
Seven dimensions structure every review run. Each dimension answers a
specific question and produces findings mapped one-to-one to that
dimension. A finding that fits two dimensions is usually shallow; split
it into two findings instead.
## Correctness
- Does the change do what it claims in the PR description?
- Are invariants preserved? Are state transitions complete?
- Are error paths handled, not just absent?
- Are off-by-one, boundary, and empty-case conditions covered?
## Safety
- Can this code corrupt or destroy data?
- Are destructive operations (deletes, truncates, migrations) gated?
- Are concurrent callers safe?
- Are there foot-guns a future caller will step on?
## Docs
- Are evergreen docs (CLAUDE.md, README, policy, reference) updated
when behavior they describe changed?
- Is the commit message sufficient? Does it explain the why?
- Does any durable rationale need a Decision Record?
- Do new public APIs have inline documentation?
## Tests
- Does the change ship with tests?
- Do the tests test behavior, not merely absence of panic?
- Are edge cases covered (boundary, empty, malformed, concurrent)?
- Are tests hermetic, or do they depend on external state?
## Compatibility
- Does the change break API consumers?
- Does it break database schemas or stored data formats?
- Does it break serialized formats or on-disk encodings?
- Does it break configuration contracts?
- If a break is intended, does a Decision Record explain the migration
path?
## Security
- Injection (SQL, shell, template, NoSQL)
- Authn / authz bypass or privilege escalation
- Exposed secrets, credentials, or sensitive data in logs
- Unsafe deserialization
- Permissioning on file, network, and service boundaries
- Supply-chain signals (new unpinned dependencies)
## Performance
- New hot-path regressions (latency, CPU, memory)
- N+1 queries, unbounded result sets
- Unbounded allocations, unbounded retries, unbounded concurrency
- Blocking I/O on hot paths
- Cache invalidation correctness
## Severity
Each finding carries a severity in {info, low, medium, high, critical}.
Gates are keyed on severity thresholds; the gate policy at
`policy/review-gates.md` defines the cutoff per dimension.
Playbooks
docs/agents/playbooks/review-change.md Run review dimensions against a diff
---
kind: playbook
binding: true
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/review-change.md
---
# Review Change Playbook
## Inputs
- A diff range (base..head) or a PR number.
- The review-dimensions reference.
- The review-gates policy.
## Steps
1. Fetch the diff. For CI invocations, use `git diff BASE_SHA..HEAD_SHA`.
2. For each file in the diff, for each review dimension, produce zero or
more findings. Each finding carries: dimension, severity, file, line,
message.
3. Apply gate rules to the findings. Any finding that triggers a gate
escalates to a blocking check result.
4. Emit the structured JSON output (see reference/review-output-schema.md
if present).
5. Render the PR comment Markdown. Update in place if a prior review
comment exists.
6. Emit per-gate check results for the CI platform.
## Rules
- One finding per dimension per location. Do not stack multiple findings
under the same dimension on the same line.
- Do not alter the diff. Review only.
- Prefer specific, actionable findings over general observations.
- Cite the exact file and line. A finding without coordinates is noise.
docs/agents/playbooks/verify.md Prove the change works
---
kind: playbook
binding: true
scope: framework
phases: [verify]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/verify.md
---
# Verify Playbook
## Four layers
1. **Unit tests.** Isolated logic on documented inputs.
2. **Integration tests.** Components wired together, real dependencies
where feasible.
3. **Staging / canary.** Production-like traffic on production-like data,
without affecting all users.
4. **Observability check.** Metrics, logs, and traces confirm the change
works in the running system.
## Steps
1. Confirm unit and integration tests pass (CI handles this).
2. If the change carries any deployment risk, exercise it in staging or
a canary tier before full rollout.
3. After rollout, inspect metrics, logs, and traces for the affected
code paths. Compare against a pre-change baseline.
4. If the change is user-facing, exercise the feature manually: at
375 / 768 / 1200 px for UI, with realistic data, along a realistic
user path.
5. Record the verify outcome in the PR (or linked incident doc if this
is post-deploy).
## Rules
- "Tests pass" is necessary, not sufficient.
- Verify is not complete until the change has been observed running
under real conditions without regression.
- If staging or observability coverage is missing for the area, flag
that as a gap. Do not skip verify because the tooling is missing;
the framework treats missing verify tooling as a governance finding.
docs/agents/playbooks/deploy.md Rollout, smoke, rollback
---
kind: playbook
binding: true
scope: framework
phases: [verify, commit]
source_of_truth: $CLAUDE_HOME/docs/agents/playbooks/deploy.md
---
# Deploy Playbook
## Pre-deploy
1. Confirm the review's gates are all green.
2. Confirm verify has reached the observability layer (or is deliberately
waived with a Decision Record).
3. Confirm the rollback path is documented. If the rollback is not a
simple revert, the PR description or a linked runbook explains the
actual procedure.
## Rollout
1. Deploy to staging or canary first, per the repo's rollout strategy.
2. Watch for the rollout duration specified by the repo's deploy config.
3. Proceed to full rollout only after canary has stabilized.
## Post-deploy smoke
1. Execute the feature-specific smoke step. A health check alone is not
sufficient; the smoke must exercise the specific feature that changed.
2. Inspect metrics and logs for the affected code paths.
3. Record the smoke outcome in the PR or deploy tracker.
## Rollback
1. If smoke fails or a regression surfaces, roll back immediately.
2. Use the documented rollback procedure. Do not improvise.
3. Open a Decision Record capturing what went wrong if the lesson is
durable.
## Incident triage
Treat incidents as a review that ran too late. Run `/agent-review-change`
against the change that caused the incident, capture the findings, and
feed them back into the review-dimensions and review-gates docs.
CI Workflow (Sanitized)
.github/workflows/agent-review.yml Auto-trigger on PR events
name: agent-review
on:
pull_request:
types: [opened, reopened, synchronize, ready_for_review]
permissions:
contents: read
pull-requests: write
checks: write
jobs:
review:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run agent review
env:
ANTHROPIC_API_KEY: ${{ secrets.AGENT_REVIEW_KEY }}
PR_NUMBER: ${{ github.event.pull_request.number }}
BASE_SHA: ${{ github.event.pull_request.base.sha }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
run: |
agent-review-change \
--diff-range "$BASE_SHA..$HEAD_SHA" \
--pr "$PR_NUMBER" \
--dimensions correctness,safety,docs,tests,compatibility,security,performance \
--output structured \
--output-path agent-review-output.json
- name: Post or update review comment
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const review = JSON.parse(fs.readFileSync('agent-review-output.json', 'utf8'));
const marker = '<!-- agent-review:comment -->';
const body = marker + '\n' + review.markdown;
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
});
const existing = comments.find(c => c.body && c.body.includes(marker));
if (existing) {
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: existing.id,
body,
});
} else {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body,
});
}
- name: Emit gate check results
if: always()
run: |
agent-review-change emit-checks \
--input agent-review-output.json \
--sha "$GITHUB_SHA"
The workflow is sanitized per the public-repo security rules: no account IDs, no bucket names, no real secret names beyond a placeholder. A production deployment substitutes the organization's own secrets, runner, and platform adapters (GitHub Checks vs. GitLab CI vs. Buildkite, etc.).