Failure Anticipation
Overview
Systematically identify potential failures, assess their risk, and plan mitigations before execution
Steps
Step 1: Decompose the plan
Break down the plan into analyzable components:
- Identify all major steps or phases
- List all inputs the plan depends on
- List all outputs the plan must produce
- Identify all external dependencies (people, systems, resources)
- Note any timing constraints or deadlines
- Map relationships between components (what depends on what)
Step 2: Generate failure modes by category
For each component, systematically consider failures in each category:
INPUT FAILURES:
- What if expected inputs are missing?
- What if inputs are wrong or corrupted?
- What if input format changes?
PROCESS FAILURES:
- What if the logic is flawed?
- What if it takes too long?
- What if resources are exhausted?
OUTPUT FAILURES:
- What if outputs are wrong?
- What if outputs are missing?
- What if outputs are rejected?
RESOURCE FAILURES:
- What if a required resource is unavailable?
- What if capacity is insufficient?
- What if costs exceed budget?
TIMING FAILURES:
- What if a step takes longer than expected?
- What if deadlines are missed?
- What if things happen in wrong order?
INTEGRATION FAILURES:
- What if an API changes?
- What if systems can’t communicate?
- What if versions are incompatible?
EXTERNAL FAILURES:
- What if a vendor fails?
- What if market conditions change?
- What if regulations change?
HUMAN FAILURES:
- What if someone makes a mistake?
- What if there’s a misunderstanding?
- What if key people are unavailable?
CASCADE FAILURES:
- What single points of failure exist?
- What failures could trigger others?
- What could cause systemic collapse?
Step 3: Score each failure mode
For each identified failure, assign FMEA scores:
OCCURRENCE (O) - How likely is this failure? 1-2: Remote (< 1 in 10,000) 3-4: Low (1 in 1,000 to 1 in 100) 5-6: Moderate (1 in 100 to 1 in 20) 7-8: High (1 in 20 to 1 in 5) 9-10: Very High (> 1 in 5)
SEVERITY (S) - How bad is the impact? 1-2: Negligible (minor inconvenience) 3-4: Minor (some rework, small delay) 5-6: Moderate (significant delay or cost) 7-8: Major (goal compromised) 9-10: Catastrophic (project failure, irreversible harm)
DETECTION (D) - How hard to detect before damage? 1-2: Almost certain to detect early 3-4: High chance of detection 5-6: Moderate chance of detection 7-8: Low chance of detection 9-10: Almost impossible to detect
Calculate RPN = O x S x D for each failure
Step 4: Prioritize and classify
Sort and classify failures by risk:
- Sort by RPN descending (highest risk first)
- Classify into tiers:
- Critical: RPN > 200 or S >= 9 (must mitigate)
- High: RPN 100-200 (should mitigate)
- Medium: RPN 50-100 (consider mitigating)
- Low: RPN < 50 (accept or monitor)
- Group by category to identify systemic patterns
- Identify single points of failure (high impact, single cause)
- Compare against risk tolerance level
Step 5: Develop mitigations
For each critical and high-priority failure, develop mitigation:
MITIGATION TYPES:
- Prevention: Stop the failure from occurring
- Detection: Catch the failure early (reduce D score)
- Reduction: Lessen the impact (reduce S score)
- Transfer: Move risk to another party (insurance, contracts)
- Acceptance: Acknowledge and prepare to handle
For each mitigation:
- Describe the specific action
- Estimate implementation effort
- Project new O, S, D scores after mitigation
- Calculate new RPN to verify improvement
- Identify who is responsible for implementation
Step 6: Create contingency plans
For failures that can’t be fully prevented, create response plans:
For each critical failure:
- Define trigger conditions (when is failure confirmed?)
- Specify immediate response actions
- Identify decision maker and escalation path
- List resources needed for response
- Define recovery steps to get back on track
- Set acceptable recovery time
Also define:
- Early warning indicators to monitor
- Kill criteria (when to abort the plan entirely)
- Communication plan for stakeholders
Step 7: Compile final assessment
Create comprehensive failure anticipation report:
- Executive summary of risk profile
- Critical failures requiring attention before proceeding
- Mitigation actions prioritized by impact/effort
- Contingency plans for unavoidable risks
- Monitoring dashboard recommendations
- Residual risks being accepted
- Go/no-go recommendation based on risk tolerance
When to Use
- Before executing any significant plan or project
- During risk assessment phase of planning
- When designing systems that must be reliable
- Before making irreversible decisions or commitments
- When stakes are high and failure is costly
- At strategy selection to compare risk profiles
- Before deployment or launch of new systems
- When inheriting or reviewing someone else’s plan
- During post-mortem analysis to improve future anticipation
- When entering unfamiliar territory with unknown risks
Verification
- All nine failure categories were examined systematically
- FMEA scores are justified, not arbitrary
- Critical failures (RPN > 200 or S >= 9) have mitigation plans
- Contingency plans are specific and actionable
- Single points of failure are identified
- Cascade failure potential is assessed
- Monitoring indicators are measurable
Input: $ARGUMENTS
Apply this procedure to the input provided.