Skip to content

Latest commit

 

History

History

restoring-everything

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
description
When disaster strikes, be prepared!

Restoring Everything

Commonly held sys-admin wisdom from across the ages says that the very worst time to be testing your backups for the first time, is when you absolutely need them to perform correctly.

Familiarize yourself with the procedures in this run-book in advance, hopefully long before you need to perform them, so you can have plenty of time to practice and hone your understanding.

Run-Book Structure

The procedures follow a check-list structure with (Optional) notations to note steps that can be abbreviated when necessary. Decisions about which steps to take or leave can be made by an Incident Commander, or "Pilot In Command" – an expression that we borrow from Aviation here in order to impress that performing any Disaster Recovery exercise may carry some significant risk, which implies that .

Run-books tasks are structured as a minimal but detailed how-to guide that covers everything you need to know about the task. The tasks are organized so that you can find what you need to know quickly, and refer to it during a specific performance of the emergency preparedness protocol.

We think that you should be practicing the disaster recovery processes in as realistic of a setting as you can afford! This can mean either in the production setting, or in a clone of production.

Many

Refer to the run-book when testing the procedures during scheduled maintenance windows, or in the event of a real emergency, in order to ensure no critical step is forgotten.

Pilot In Command

The pilot in command (PIC) of an aircraft is the person aboard the aircraft who is ultimately responsible for its operation and safety during flight. Correct and timely performance of a disaster recovery procedure when critical failures occur can spell out the difference between being "back in business" and going "out of business" with no small exaggeration.

If you have formal designations such as "Data Steward", "Data Owner", "Product Owner", those are all people who should be in-the-loop when making strategic decisions about the Disaster Recovery plan, how it is practiced, or what does a disaster recovery drill entail exactly.

There is no substitute for experience in an emergency. Be sure you know who is "incident commander" before the disaster strikes if possible, and if there's any ambiguity, be sure there is a clear chain of command and protocol for communicating what actions are being taken and a clear channel for status updates when an incident timeline is progressing, or when the trouble clears.