Hook
Green tests can hide real risk.
Problem
Tests often validate ideal conditions, while real failures happen at edges: timeouts, bad inputs, and dependency outages.
Why it matters
Testing failure modes reveals weaknesses before production does. It also improves resilience by making recovery paths explicit.
Signals you are here
- Incidents come from scenarios no test covers
- Dependencies fail and the system collapses
- Load spikes cause unpredictable behavior
- Environment differences create surprises
Anti-patterns
- Only unit tests for happy paths
- No load, chaos, or dependency-failure tests
- Staging environments that do not match production
- Ignoring error handling in tests
Try this
- Add boundary and invalid-input tests
- Simulate dependency failures and timeouts
- Run load tests for critical paths
- Keep staging parity with production
- Exercise rollback and recovery flows
Example
A team added a database timeout test and discovered a request handler that hung indefinitely. They added timeouts and fallbacks before the next release.
Reflection prompt
Which failure mode would hurt most? Write a test for it this week.
More like this
Heuristic
Make Infrastructure Disposable
Cattle, not pets.
Heuristic
The Best Configuration Is No Configuration
Defaults beat decisions.
Heuristic
Work in Small Batches
Small batches make failure cheap and learning fast.
Heuristic
Your Deployment System Is a Product
Great deploys are built, not improvised.
Heuristic
Blame the Process, Not People
Fix the system.
Heuristic
Fail Closed, Log Everything, Recover Gracefully
Safe failure beats quiet failure.
