Hook
Silence is the worst failure.
Problem
Systems often fail in unsafe or invisible ways. Errors get swallowed, and partial failures spread while operators lack the data to respond.
Why it matters
Failing closed protects security and correctness. Clear logs and recovery paths reduce mean time to recovery and prevent repeat incidents.
Signals you are here
- Errors are logged inconsistently or not at all
- Systems fail open to keep serving bad data
- Alerts are vague and lack context
- Recovery requires manual intervention each time
Anti-patterns
- Swallowing exceptions without logging
- Defaulting to allow when checks fail
- No structured logging or trace IDs
- Retries without backoff or limits
Try this
- Use structured logs with clear error codes
- Fail closed for security and data integrity
- Implement retries with backoff and circuit breakers
- Provide clear runbooks for recovery
- Test failure paths regularly
Example
An auth service used to fail open during timeouts. After switching to fail-closed with clear logs and retries, unauthorized access stopped and incident recovery became predictable.
Reflection prompt
Where does your system fail silently today? Make one failure path explicit and safe.
More like this
Heuristic
Shift Security Left
Secure by default, not by exception.
Heuristic
You Cannot Rely on People Under Stress
Design for tired humans.
Heuristic
Increase Contrast, Not Volume
Prompt length does not guarantee novelty. Context contrast does.
Heuristic
Know What You Have
Inventory before security.
Heuristic
Runbooks Are a Bridge Between Dev and Ops
Runbooks turn knowledge into action.
Heuristic
Secrets Decay Faster Than Code
Secrets should expire.
