Reviewing AI-generated code: a practical checklist
Othman Shareef · June 16, 2026 · 7 min read
Most working developers now have an AI assistant in the loop: Stack Overflow’s 2024 survey put it at three-quarters of respondents using or planning to use AI tools. That means most code review is now, at least partly, review of generated code. The awkward truth: generated code doesn’t fail the way human code fails, and a checklist tuned for human failure modes will miss it.
1. Intent match comes first
The signature failure of generated code is being a correct answer to a slightly different question. It compiles, it’s idiomatic, the tests pass, and yet it implements a subtly different behavior than the issue asked for, because the prompt (or the model’s reading of the codebase) drifted. Before reading any implementation, re-read the requirement, then check the change’s observable behavior against it. Everything else on this list is cheaper to check than this, which is exactly why it gets skipped.
2. Verify that everything it calls exists
Models still invent: a config option that was never real, a method from a different library’s API, an import that resolves only in the model’s memory of an older version. Types and CI catch most of it. The dangerous remainder is the call that exists but does something different than the generated comment claims. If a usage looks unfamiliar, check the docs, not the comment above it.
3. Audit the edges, not the middle
Generated code is strongest on the happy path, because that’s where the training data lives. Spend your attention where it’s weakest: error handling that swallows instead of propagating, empty and null inputs, timeouts, partial failures, and any shared state. A quick heuristic: find every catch and every default value, and ask whether a human with context would have chosen it.
4. Test the tests
When the same model writes the code and its tests, the tests often assert what the code does, not what it should do: a tautology with good coverage numbers. Read the assertions against the requirement. The tell: tests with exact-value assertions copied from implementation output, and no test for the case the ticket was actually about.
5. Check codebase fit
Generators reach for the median solution from everywhere, not your conventions. Watch for re-implemented helpers that already exist, a third state-management pattern, or styles that fight the file around them. Industry analyses such as GitClear’s code-quality reports have flagged rising duplication and churn alongside AI adoption. Review is where that trend gets stopped, one duplicate helper at a time.
6. Sweep the security surface
Generated code inherits the average security posture of its training data, which is not a compliment. String-built SQL or shell commands, permissive CORS, broad IAM defaults, tokens logged “temporarily”. None of these are exotic; all of them show up in generated diffs regularly. If the change touches input handling, auth, or anything that executes, slow down there.
7. Ask whether a human read it first
The cheapest quality gate for generated code is upstream of review: the author reading their own diff before requesting eyes. Teams adopting agents are formalizing this as pre-PR review: reading the agent’s work on your own branch before it becomes a pull request. It’s a workflow we believe in enough to have built into Pyor (our product, free for individuals): review the agent’s diff locally, leave notes, land a cleaner PR. Your reviewers get changes a human has already vouched for, and this checklist gets shorter every time.
For the size dimension of the same problem (agents make big diffs cheap), see how big should a pull request be?