4 Standard IT Disaster Scenarios you aren't prepared for
Planning for things to go wrong, for Dummy Programmers
In any system design, there are several scenarios that should be considered to prevent system failure. Each of these scenarios describes a worst-case scenario that frames planning for catastrophic events.

Often, when describing the need for various emergency protocols, the presenter is faced with resistance in the form of “but we trust each other”. These forms of argument distract from the very real underlying risk that needs to be addressed.
These descriptions, and their titles are meant to give a standardised response to the most common objections. Each scenario has a list of ways the scenario presents in the real world. The titles are somewhat humorous to ease the tension, but the scenarios are serious and realistic.
The scenarios are also meant to be non-specific. Rather than planning for very specific events, general scenarios that encompass general responses allow for adaptation to multiple considerations.
- Under the Bus
- Bump on the Head
- Spiked Drink
- Sword of God
- Daemonic Possession (Bonus, I added another later)
Under the Bus
The primary on a system got run over by a bus on the way to work, and has been hospitalised for an indeterminate amount of time.
Presentation
Any unavailability of the system experts, potentially combined with the need for action
- Accident: sky-diving, home repair, car accident
- Vacation: phoning people while they are on vacation is rude
- Illness: myocarditis, kidney stones, hemorrhoids, common cold
- Arrest: sometimes people get detained; rightly or wrongly
Objections
“That's a horrible thing to say”
If it makes you feel better, they are going to be OK; but accidents do happen in life. Do you really want to be the person that is phoning a colleague while they should be resting in hospital?
“We better make sure you don't do anything risky”
As a manager, did you just inform your employees that they are not to undertake any personal activities?
Bump on the Head
One of the trusted individuals has recently received a bump on the head and now has a brain injury that has drastically altered their personality. They can no longer be trusted. It is unclear for how long they were trusted when they should not have been.
Presentation
- An actual bump on the head: has caused people's personalities to dramatically change
- Blackmail: or possibly bribery, where an outside actor has altered the state of the trust relationship
- Poor trust evaluation: You shouldn't have trusted them in the first place
- External system breach: a trusted individual has had their digital identity compromised.
Objections
“It's OK, I trust you”
Stop exposing me to risk, its unfair. The minute something does go wrong, employees should have evidence in place that they were acting within acceptable parameters, and that the managerial staff had accepted any risks associated with the action. If judgement calls were required, and bad things happened, employees need to have a clear line of approval in place that they can point to as having failed (justifying their taking action)

Spiked Drink
The trusted individual stands up from lunch and realise they are feeling “wobbly”. Someone spiked their drink.
Presentation
Any scenario where the actor has a compromised capacity for judgement
• Woken in the middle of the night
 • Family emergencies
 • Had a couple of drinks, heavy pain medication
 • Compromised judgement results in the inability to judge yourself compromised.
 • Snap decisions
Plan for individuals to be able to declare themselves incapacitated or compromised; plan for them to take action even when their judgement is compromised; plan to declare someone else's judgement as compromised. Have clear instructions in place to reduce the need for judgement (do your thinking in advance)
Objections
“People aren't allowed to drink on duty”
Being on-call, or worse, being the second or third person on call, during an emergency can result in you being activated at unanticipated times. The only way to avoid this is to consider all staff on-call 24/365.
Sword of God
(aka Sodom and Gomorrah, Meteor Impact, Zombies)
The facility has just been hit by a meteor. Where there was a service center, there is now a crater.
If it makes you feel better, everyone in the region is OK but more than a little distracted.
Presentation
Any regional outage that results in entire service being lost. Limited to no staff in the region able to respond.
- Power outage
- Natural Disaster (storm, tsunami, earthquake)
- Epidemic
- War
Objections
“Don't be over-dramatic”
During the 2005 Ice Storm in Montreal, a colleague's phone rang with a request for technical assistance from another company. Located in Montreal, they had been without power for two days. Generators had activated, and the facility was operational; however, due to the high demand for fuel replenishment and the state of infrastructure, they were unable to secure more diesel. Their three day supply was about to run out.
A heroic effort was undertaken, unfortunately, due to the massive disruption to infrastructure, we were unable to rebuild their services on our infrastructure before the fuel ran out … leaving hundreds of thousands of Canadians without service for weeks.
Dæmonic Possession
(aka Planetary Alignment, Plumb Bad Luck)
You've done everything perfectly, but there is a very small dæmon living inside your computer. As you type your solution, it waits inside for an inopportune moment and messes something up. Something important.
Presentation
Software systems are complex systems, and complex systems are just that … complex. Complexity leads to unpredictability, and that is basically “random” behaviour. This can present in all kinds of ways, none of them predictable.
- fat-fingering, typos
- stuff just stops working … nobody knows why
Objections
If I can't predict it, how can I plan for it
This is fatalism, giving up, and that we must not do.
Preparing for bizarro land is not easy, but it is possible. Generally, this is done through constant testing and rehearsal (you are rehearsing disasters aren't you). This forces people to have practiced system failures, and general system recovery, under controlled circumstances.
Conclusion
Originally Published on my private consultancy website in 2013, this became something I wanted to preserve, share, and keep living. I have shared it with every company I have worked with, but I think it needs to be more widely distributed because I have yet to see a company that can handle any of these scenarios
Once upon a time, this list was at least 6 items long. Expect more in the future.
Also, I feel dirty for having used a “numbered list title”.