Failure Anticipation

Overview

Systematically identify potential failures, assess their risk, and plan mitigations before execution

Steps

Step 1: Decompose the plan

Break down the plan into analyzable components:

Identify all major steps or phases
List all inputs the plan depends on
List all outputs the plan must produce
Identify all external dependencies (people, systems, resources)
Note any timing constraints or deadlines
Map relationships between components (what depends on what)

Step 2: Generate failure modes by category

For each component, systematically consider failures in each category:

INPUT FAILURES:

What if expected inputs are missing?
What if inputs are wrong or corrupted?
What if input format changes?

PROCESS FAILURES:

What if the logic is flawed?
What if it takes too long?
What if resources are exhausted?

OUTPUT FAILURES:

What if outputs are wrong?
What if outputs are missing?
What if outputs are rejected?

RESOURCE FAILURES:

What if a required resource is unavailable?
What if capacity is insufficient?
What if costs exceed budget?

TIMING FAILURES:

What if a step takes longer than expected?
What if deadlines are missed?
What if things happen in wrong order?

INTEGRATION FAILURES:

What if an API changes?
What if systems can’t communicate?
What if versions are incompatible?

EXTERNAL FAILURES:

What if a vendor fails?
What if market conditions change?
What if regulations change?

HUMAN FAILURES:

What if someone makes a mistake?
What if there’s a misunderstanding?
What if key people are unavailable?

CASCADE FAILURES:

What single points of failure exist?
What failures could trigger others?
What could cause systemic collapse?

Step 3: Score each failure mode

For each identified failure, assign FMEA scores:

OCCURRENCE (O) - How likely is this failure? 1-2: Remote (< 1 in 10,000) 3-4: Low (1 in 1,000 to 1 in 100) 5-6: Moderate (1 in 100 to 1 in 20) 7-8: High (1 in 20 to 1 in 5) 9-10: Very High (> 1 in 5)

SEVERITY (S) - How bad is the impact? 1-2: Negligible (minor inconvenience) 3-4: Minor (some rework, small delay) 5-6: Moderate (significant delay or cost) 7-8: Major (goal compromised) 9-10: Catastrophic (project failure, irreversible harm)

DETECTION (D) - How hard to detect before damage? 1-2: Almost certain to detect early 3-4: High chance of detection 5-6: Moderate chance of detection 7-8: Low chance of detection 9-10: Almost impossible to detect

Calculate RPN = O x S x D for each failure

Step 4: Prioritize and classify

Sort and classify failures by risk:

Sort by RPN descending (highest risk first)
Classify into tiers:
- Critical: RPN > 200 or S >= 9 (must mitigate)
- High: RPN 100-200 (should mitigate)
- Medium: RPN 50-100 (consider mitigating)
- Low: RPN < 50 (accept or monitor)
Group by category to identify systemic patterns
Identify single points of failure (high impact, single cause)
Compare against risk tolerance level

Step 5: Develop mitigations

For each critical and high-priority failure, develop mitigation:

MITIGATION TYPES:

Prevention: Stop the failure from occurring
Detection: Catch the failure early (reduce D score)
Reduction: Lessen the impact (reduce S score)
Transfer: Move risk to another party (insurance, contracts)
Acceptance: Acknowledge and prepare to handle

For each mitigation:

Describe the specific action
Estimate implementation effort
Project new O, S, D scores after mitigation
Calculate new RPN to verify improvement
Identify who is responsible for implementation

Step 6: Create contingency plans

For failures that can’t be fully prevented, create response plans:

For each critical failure:

Define trigger conditions (when is failure confirmed?)
Specify immediate response actions
Identify decision maker and escalation path
List resources needed for response
Define recovery steps to get back on track
Set acceptable recovery time

Also define:

Early warning indicators to monitor
Kill criteria (when to abort the plan entirely)
Communication plan for stakeholders

Step 7: Compile final assessment

Create comprehensive failure anticipation report:

Executive summary of risk profile
Critical failures requiring attention before proceeding
Mitigation actions prioritized by impact/effort
Contingency plans for unavoidable risks
Monitoring dashboard recommendations
Residual risks being accepted
Go/no-go recommendation based on risk tolerance

When to Use

Before executing any significant plan or project
During risk assessment phase of planning
When designing systems that must be reliable
Before making irreversible decisions or commitments
When stakes are high and failure is costly
At strategy selection to compare risk profiles
Before deployment or launch of new systems
When inheriting or reviewing someone else’s plan
During post-mortem analysis to improve future anticipation
When entering unfamiliar territory with unknown risks

Verification

All nine failure categories were examined systematically
FMEA scores are justified, not arbitrary
Critical failures (RPN > 200 or S >= 9) have mitigation plans
Contingency plans are specific and actionable
Single points of failure are identified
Cascade failure potential is assessed
Monitoring indicators are measurable

Input: $ARGUMENTS

Apply this procedure to the input provided.