Code Review

Input: $ARGUMENTS

Overview

Procedure for conducting effective code reviews that improve quality and share knowledge. This procedure leverages Claude Code’s unique capabilities: exhaustive pattern search, codebase-wide consistency checking, and automated verification.

Step 0: Context Assessment

Factor	Value	Notes
Change size	SMALL (<50 lines) / MED (50-300) / LARGE (300+)
Risk level	HIGH (security, data) / MED / LOW
Review depth	THOROUGH / STANDARD / QUICK

If SMALL + LOW risk: Steps 1, 3, 6, 7 only If LARGE or HIGH risk: All steps + extra scrutiny

Step 1: Understand the Context

Before reviewing code, understand what it’s trying to accomplish.

What to understand:

PR title and description
Linked issues or tickets
Scope: bug fix, feature, refactor?
Specific review requests from author
Part of larger initiative?

LLM Execution:

# If given a PR URL/number, get PR info
→ Use Bash: gh pr view [number] --json title,body,files

# If given files directly, understand from context
→ Use Read: [changed files] to understand purpose

# Check for linked issues
→ Use Bash: gh issue view [number] (if referenced)

Output format:

CONTEXT
=======
Purpose: [what this change accomplishes]
Scope: [bug fix / feature / refactor / other]
Risk areas: [security, performance, data, none]
Review focus: [what author specifically wants reviewed]

Step 2: High-Level Architecture Review

Review the overall approach before diving into details.

What to check:

Files changed - does scope make sense?
Approach aligns with existing architecture?
Change is in the right place?
Any simpler approaches?
Cross-cutting concerns (logging, errors, auth)?

LLM Execution:

# See all changed files
→ Use Bash: gh pr diff [number] --name-only
  OR: git diff --name-only [base]...[head]

# Understand file organization
→ Use Glob: pattern="**/*.{py,js,ts}" (or relevant extensions)

# Check if similar functionality exists elsewhere
→ Use Grep: pattern="[similar_function_name]" to find existing implementations

# Understand dependencies of changed files
→ Use Grep: pattern="import.*[changed_module]|from [changed_module]"

Output format:

ARCHITECTURE
============
Files changed: [count and summary]
Scope appropriate: [YES / NO - reason]
Architecture alignment: [YES / concern if NO]
Simpler approach: [suggestion or "none identified"]
Cross-cutting concerns: [noted or "none"]

Step 3: Detailed Line-by-Line Review

Review each changed file systematically.

What to check per file:

Correctness - logic works for all cases?
Common bug patterns (null checks, boundaries, error handling)
Security implications
Readability and maintainability
Code style consistency

LLM Execution:

# Read each changed file
→ Use Read: [file_path]

# For each file, also check:
# - Related test file
→ Use Glob: pattern="**/test*[filename]*" or "**/[filename]*test*"

# - How the changed code is used
→ Use Grep: pattern="[function_name]\(" to find callers

# - If modifying existing function, check all usages
→ Use Grep: pattern="[modified_function]" output_mode="content" -C 2

Common Bug Patterns to Check:

Pattern	What to look for	Severity
Null safety	Unguarded access, missing null checks	CONCERN
Boundaries	Off-by-one, empty arrays, max values	CONCERN
Error handling	Swallowed exceptions, missing handlers	CONCERN
Resource leaks	Unclosed files/connections	BLOCKING
Injection	Unsanitized user input in SQL/shell/HTML	BLOCKING
Race conditions	Shared mutable state without synchronization	BLOCKING

Security Patterns (Always BLOCKING):

Pattern	What to look for	How to detect
SQL Injection	String concatenation in queries	Grep: `query.+.$
XSS	Unescaped user input in HTML	Grep: `innerHTML.*$
Auth Bypass	Missing auth checks, hardcoded credentials	Grep: `password.=.“
Path Traversal	User input in file paths	Grep: `open(.*$
SSRF	User input in URLs/requests	Grep: `fetch(.*$
Secrets in Code	API keys, passwords in source	Grep: `api.key

Performance Patterns (Usually CONCERN):

Pattern	What to look for	How to detect
N+1 Queries	Loop with database call inside	Grep: `for.:..query(
Missing Index	Query on unindexed field	Check query fields against schema indexes
Memory Bloat	Loading large datasets entirely	Grep: `.all()
Unbounded Growth	Collections that grow without limit	Grep: `.append(
Blocking I/O	Sync I/O in async context	Grep: `open\(` without async in async function

Maintainability Patterns (Usually NIT or CONCERN):

Pattern	What to look for	How to detect
Long Methods	Functions > 50 lines	Count lines per function
Deep Nesting	Indentation > 4 levels	Count indentation depth
God Objects	Classes with > 10 public methods	Grep: `def` count per class
Circular Dependencies	A imports B, B imports A	Trace import chains
Magic Numbers	Unexplained numeric literals	Grep: `[^0-9][0-9]{2,}[^0-9]` without const
Commented Code	Large blocks of commented code	Grep: `#.*def
TODO in Critical Path	TODOs in security/auth code	Grep: `TODO

Concurrency Patterns (Often BLOCKING):

Pattern	What to look for	How to detect
Deadlock Risk	Multiple locks acquired in different orders	Trace lock acquisition order
Data Race	Shared mutable state without locks	Grep: shared variables + threads
Lost Update	Read-modify-write without atomicity	Grep: `x = x +
Starvation	Unfair resource allocation	Check lock fairness, priority inversion

Output per file:

FILE: [path]
------------
Purpose: [what this file change does]
Issues found:
- [Line N]: [issue] - [severity: BLOCKING/CONCERN/NIT]
- ...
Questions:
- [question about unclear code]
Positive notes:
- [good practices observed]

Step 4: Review Tests

Evaluate the test coverage and quality.

What to check:

Tests exist for new functionality?
Main success path covered?
Edge cases and error conditions tested?
Tests are meaningful (not just coverage)?
Tests are maintainable?

LLM Execution:

# Find test files for changed code
→ Use Glob: pattern="**/test*" or "**/tests/**"

# Read test files
→ Use Read: [test_file_path]

# Check what's tested
→ Use Grep: pattern="def test_|it\(|describe\(" in test files

# Verify tests cover the changed functions
→ Use Grep: pattern="[changed_function_name]" in test files

# Run the tests
→ Use Bash: pytest [test_path] -v
  OR: npm test -- [test_path]
  OR: [appropriate test command]

Output format:

TESTS
=====
Test files: [list]
Coverage of changes: [GOOD / PARTIAL / MISSING]

Test gaps:
- [functionality not tested]
- [edge case not covered]

Test quality:
- Meaningful assertions: [YES/NO]
- Edge cases: [covered/missing which]
- Error cases: [covered/missing which]

Step 5: Run Verification (Automated Checks)

Run automated checks to catch issues.

What to run:

Test suite
Linter/formatter
Type checker (if applicable)
Build (if applicable)

LLM Execution:

# Run test suite
→ Use Bash: pytest -v
  OR: npm test
  OR: [project-specific test command]

# Run linter
→ Use Bash: ruff check . (Python)
  OR: eslint . (JS)
  OR: [project-specific linter]

# Run type checker
→ Use Bash: mypy [path] (Python)
  OR: tsc --noEmit (TypeScript)

# Run formatter check
→ Use Bash: black --check . (Python)
  OR: prettier --check . (JS)

Output format:

AUTOMATED CHECKS
================
Tests: [PASS / FAIL - details]
Linter: [PASS / FAIL - details]
Types: [PASS / FAIL - details]
Build: [PASS / FAIL - details]

Step 6: LLM Superpowers (Exhaustive Checks)

Leverage LLM capabilities humans can’t easily do.

6.1: Codebase-Wide Pattern Search

# Find all usages of modified functions
→ Use Grep: pattern="[modified_function]\(" output_mode="content"

# Check: Do callers handle new behavior/errors correctly?

# Find similar code patterns
→ Use Grep: pattern="[similar_pattern]" to check consistency

# Check: Is the same bug/pattern present elsewhere?

6.2: Cross-Reference Verification

# For renamed functions, check for old name usage
→ Use Grep: pattern="[old_function_name]"

# For moved files, check import statements
→ Use Grep: pattern="from [old_path] import|import [old_module]"

# For changed APIs, check all callers
→ Use Grep: pattern="[api_function]\("

6.3: Consistency Checking

# Check naming conventions match project
→ Use Grep: pattern="def [pattern]|class [pattern]" to see existing conventions

# Check error handling patterns match
→ Use Grep: pattern="raise|except|try:" to see project patterns

# Check logging patterns match
→ Use Grep: pattern="logger\.|logging\." to see project patterns

6.4: Documentation Check

# Check if public APIs have docstrings
→ Use Grep: pattern='def [function_name]\([^)]*\):\s*"""' -A 5

# Check if README needs update
→ Use Read: README.md and compare to changes

Output format:

LLM EXHAUSTIVE CHECKS
=====================
All callers handled: [YES / NO - list affected]
Similar patterns checked: [findings]
Naming consistent: [YES / NO - inconsistencies]
Documentation: [adequate / needs update]

Step 6.5: Severity Calibration

Before writing feedback, calibrate issue severity using these explicit criteria:

BLOCKING Criteria (Must Fix Before Merge)

An issue is BLOCKING if ANY of these are true:

Criterion	Test	Example
Security vulnerability	Could this be exploited?	SQL injection, auth bypass, XSS
Data loss risk	Could this lose/corrupt user data?	Missing transaction, race on write
Crash risk	Could this crash in production?	Unhandled null, divide by zero
Test failure	Do tests fail with this change?	Broken test, assertion fails
Breaking change	Does this break existing callers?	Changed API without migration
Compliance violation	Does this violate legal/regulatory?	GDPR, HIPAA, PCI violations

CONCERN Criteria (Should Fix, May Defer)

An issue is CONCERN if ANY of these are true AND it’s not BLOCKING:

Criterion	Test	Example
Performance risk	Could this cause slowdown in prod?	N+1 query, missing index
Maintainability harm	Will this make future changes harder?	Deep nesting, god object
Error handling gap	Could errors go unnoticed?	Swallowed exception
Missing test	Is important functionality untested?	New feature without test
Unclear code	Would another dev struggle to understand?	Complex logic, missing comments

NIT Criteria (Minor, Optional)

An issue is NIT if it’s about:

Area	Examples
Style	Naming convention, formatting
Preference	This approach vs. that approach (both valid)
Minor clarity	Could be slightly clearer
Optimization	Slightly more efficient alternative

Calibration Examples

BLOCKING examples:

- Missing parameterized query → SQL injection → BLOCKING
- Auth check missing on admin endpoint → Auth bypass → BLOCKING
- NullPointerException possible on common path → Crash risk → BLOCKING
- Test fails on main assertion → Test failure → BLOCKING

CONCERN examples:

- Database call in loop → N+1 → CONCERN (perf in prod)
- Exception caught and logged but not re-raised → Error handling gap → CONCERN
- 100-line function → Maintainability → CONCERN
- New endpoint has no test → Missing test → CONCERN

NIT examples:

- Variable named 'x' could be 'userCount' → Style → NIT
- Could use list comprehension instead of loop → Preference → NIT
- Comment would help here → Minor clarity → NIT

Step 7: Write Review Feedback

Compose clear, constructive review comments.

Issue categories (now calibrated by Step 6.5):

Category	Symbol	Meaning	Calibrated By
Blocking	[BLOCKING]	Must fix before merge	Step 6.5 BLOCKING criteria
Concern	[CONCERN]	Should address, might defer	Step 6.5 CONCERN criteria
Suggestion	[SUGGESTION]	Take it or leave it	Not in criteria, just ideas
Nit	[NIT]	Minor style/preference	Step 6.5 NIT criteria
Question	[QUESTION]	Need clarification	Uncertainty about code intent
Praise	[PRAISE]	Positive feedback	Good patterns observed

Comment structure:

[CATEGORY] File:Line - Brief summary

What: [describe the issue]
Why: [explain why it matters]
Suggestion: [how to fix, if applicable]

Output format:

REVIEW COMMENTS
===============
[BLOCKING] src/auth.py:42 - SQL injection vulnerability
What: User input passed directly to query
Why: Security risk - allows arbitrary SQL execution
Suggestion: Use parameterized query: cursor.execute("SELECT * WHERE id = ?", (user_id,))

[CONCERN] src/api.py:105 - Missing error handling
What: API call can throw, not caught
Why: Will crash on network errors
Suggestion: Wrap in try/except with appropriate error response

[PRAISE] src/utils.py:20 - Good use of type hints
What: Clear typing improves maintainability

... [more comments]

Step 8: Submit Review Decision

Determine and submit the overall review outcome.

Decision criteria:

Condition	Decision
Any BLOCKING issues	Request Changes
Only CONCERN/SUGGESTION/NIT	Approve (with comments)
Only questions	Comment (request discussion)
Clean	Approve

Output format:

REVIEW DECISION
===============
Decision: [APPROVE / REQUEST CHANGES / COMMENT]

Summary:
- Blocking issues: [count]
- Concerns: [count]
- Suggestions: [count]
- Questions: [count]

Overall assessment: [1-2 sentence summary]

Next steps: [what author should do]

When to Use

Pull request submitted and ready for review
Code changes need verification before merge
Pre-merge quality check required
Knowledge sharing through code examination
Onboarding developers to unfamiliar code
Security-sensitive changes need scrutiny
Architectural changes need validation

Verification Criteria

Step	Verification
Step 1	Context understood (purpose, scope, risk)
Step 2	Architecture assessed
Step 3	All files reviewed with issues categorized
Step 4	Test coverage assessed
Step 5	Automated checks run
Step 6	Exhaustive LLM checks completed
Step 7	Comments written with categories
Step 8	Decision made with rationale

Overall verification:

All changed files reviewed
Correctness verified for main paths
Security implications considered
Test coverage assessed
Feedback is specific and actionable
Blocking issues clearly identified
Decision is clear and appropriate

Integration Points

Often invoked from: /procedure_engine (code review request), User request to review PR
Routes to: /debugging (if bug found), /refactoring (if structural issues), /testing_strategy (if test gaps)
Related: /debugging, /refactoring, /architecture_design