Verification-First Development
The Verification Gap
Section titled “The Verification Gap”AI coding agents can generate 1.75x more logic errors than human-written code (ACM 2025). Without verification:
- Code looks right but doesn’t handle edge cases
- You become the only feedback loop
- Every mistake requires your manual attention
- Bugs compound silently through rapid iteration
With verification:
- The agent catches its own errors before you see them
- Tests act as executable specifications
- Code quality improves with each iteration
- You review verified, working code instead of untested experiments
The TDD-Agent Synergy
Section titled “The TDD-Agent Synergy”Test-driven development turns out to be a natural fit for coding agents. Here’s why:
- Tests are natural language specs — a test describes exactly what the code should do, reducing ambiguity
- Tests provide instant feedback — the agent knows immediately whether its implementation works
- Tests prevent regression — as the agent iterates, existing tests catch regressions
- Tests keep focus small — TDD encourages implementing one behavior at a time, preventing bloated implementations
The Red-Green-Refactor Cycle for Agents
Section titled “The Red-Green-Refactor Cycle for Agents”-
Red: Write a failing test
Write a test for the rate limiter that verifies:- A client can make 100 requests per minute- The 101st request returns 429 Too Many Requests- After 60 seconds, the client can make requests againRun the test and confirm it fails. -
Green: Implement minimum code
Implement the rate limiter to pass the failing tests.Use the minimum code necessary — don't over-engineer.Run the tests and confirm they pass. -
Refactor: Clean up while green
Refactor the rate limiter for clarity and performance.Keep all tests green. Run the full test suite after refactoring.
Verification Strategies by Task Type
Section titled “Verification Strategies by Task Type”Write a validateEmail function.Test cases:- user@example.com → true- invalid → false- user@.com → false- @domain.com → falseRun the tests after implementing.[paste screenshot of target design]Implement this design for the dashboard header.Take a screenshot of the result and compare it to the original.List differences and fix them.The build fails with this error: [paste error]Fix it and verify the build succeeds.Address the root cause, don't suppress the error.Write a regression test that would catch this bug.Refactor the OrderProcessor class to use the Strategy pattern.Before starting:1. Run the existing test suite and note all passing tests2. Make changes incrementally3. After each change, run tests and verify nothing broke4. Add tests for any new public interfacesThe Guardrail Stack
Section titled “The Guardrail Stack”Verification works best as a layered system:
| Layer | Mechanism | When It Runs |
|---|---|---|
| 1. Type checking | tsc --noEmit | After every file edit (via hooks) |
| 2. Linting | eslint, biome | After every file edit (via hooks) |
| 3. Unit tests | vitest run <file> | After implementing each function |
| 4. Integration tests | vitest run --integration | After completing a feature |
| 5. Coverage check | Coverage threshold gate | Before committing |
| 6. E2E tests | playwright test | Before PR creation |
Benchmarked Results
Section titled “Benchmarked Results”| Approach | Regressions | Resolution Rate |
|---|---|---|
| No verification | Baseline | Baseline |
| TDD prompting only | +9.94% regressions | — |
| TDD + contextual test targets | -70% regressions | +33% resolution |
| Full guardrail stack | -85% regressions | +45% resolution |
The data is clear: verification isn’t optional — it’s the foundation of reliable agentic development.
Key Takeaways
Section titled “Key Takeaways”- Always provide verification criteria — tests, screenshots, expected outputs
- Use TDD naturally: write tests first, confirm they fail, implement to pass
- Don’t lecture agents on TDD methodology — tell them which tests to run
- Layer verification: types → lint → unit tests → integration → E2E
- Configure hooks for automatic verification after every edit
- Treat agents like fast junior engineers: clear constraints, demanded plans, enforced tests