← Back to all heuristics

Fail Closed, Log Everything, Recover Gracefully

Safe failure beats quiet failure.

ReliabilitySecurityOperationsSecurity

Heuristic

Design for safe failure with clear signals and recovery paths.

Hook

Silence is the worst failure.

Problem

Systems often fail in unsafe or invisible ways. Errors get swallowed, and partial failures spread while operators lack the data to respond.

Why it matters

Failing closed protects security and correctness. Clear logs and recovery paths reduce mean time to recovery and prevent repeat incidents.

Signals you are here

  • Errors are logged inconsistently or not at all
  • Systems fail open to keep serving bad data
  • Alerts are vague and lack context
  • Recovery requires manual intervention each time

Anti-patterns

  • Swallowing exceptions without logging
  • Defaulting to allow when checks fail
  • No structured logging or trace IDs
  • Retries without backoff or limits

Try this

  • Use structured logs with clear error codes
  • Fail closed for security and data integrity
  • Implement retries with backoff and circuit breakers
  • Provide clear runbooks for recovery
  • Test failure paths regularly

Example

An auth service used to fail open during timeouts. After switching to fail-closed with clear logs and retries, unauthorized access stopped and incident recovery became predictable.

Reflection prompt

Where does your system fail silently today? Make one failure path explicit and safe.

More like this

Heuristic

Shift Security Left

Secure by default, not by exception.

SecurityReliabilitySecurity

Heuristic

You Cannot Rely on People Under Stress

Design for tired humans.

ReliabilityOperationsSRE

Heuristic

Increase Contrast, Not Volume

Prompt length does not guarantee novelty. Context contrast does.

ArchitectureOperations

Heuristic

Know What You Have

Inventory before security.

SecuritySecurity

Heuristic

Runbooks Are a Bridge Between Dev and Ops

Runbooks turn knowledge into action.

OperationsReliabilitySRE

Heuristic

Secrets Decay Faster Than Code

Secrets should expire.

SecuritySecurity