With the one-two punch of an earthquake in the Mid-Atlantic US followed closely by a hurricane potentially hitting the same region, Disaster Recovery is probably a popular discussion right now in the Washington DC area. Of course Business Continuity professionals write contingency plans for all types of disasters, not just ones caused by nature. Don’t you want to ask God, or Mother Nature, or the Flying Spaghetti Monster what else we should be planning for, and in what order or combination? Do you think DC Business Continuity professionals have planned for things like enemy countries parachuting in armies of robotic killer crabs with pompadours and lasers? What about zombies? And is there a real difference between planning for zombies or natural disasters? Well, that’s what I’d like to explore today; the nuances of disaster planning for the zombie apocalypse as it pertains to data protection.
I’d like you to treat this like watching a Sci-Fi film, so suspend your disbelief and just go with it. We all know that when the zombie apocalypse DOES happen, most companies won’t be recovering. After all, the majority of consumers will only be consuming one thing. Flesh. The rest of us will be chased through vast neutral-toned forests of cubicles until we trip over a branch of our network and become our adversary’s next hot lunch.
Let’s assume for the sake of this blog post, that your primary Data Center was burned to the ground by some sort of zombie activity. And… ACTION!
Wait, wait, wait. Hold it. Before this disaster happens, we should take some steps to mitigate future risk. We need to be running lean. It’s especially important in situations like this one, when you’re shorthanded. Do more with less, right? You’ve heard it all before. But it really takes on new meaning if Dave, your Oracle DBA and FORMER lunch buddy, now has a taste for brains (BRAINS!). Let’s face it; at this point your data protection team can be any of the following:
a) A zombie
b) Bit and will be a zombie at any moment
c) Trying to find their family and “hunker down” in that completely self-sustaining cabin/bunker/island they know about
d) Leading a group of survivors in guerilla-like hit & run missions from the back of a heavily fortified ice cream truck; they’re gonna make our streets safe again!
f) Working diligently with you on getting your corporations mission critical data and systems operational
(Note: A and B may be the most inaccurate views of this entire blog post, considering tech guys will likely survive longer than anyone else if we’re overrun by zombies. D may be the most accurate. I don’t know about you, but I’ve been training for this since the day I got my first zit.)
Storage optimization and server consolidation, along with automated information management processes can take days off of recovery time and allow you to regularly review what data and services are mission critical and also what data and services are superfluous to core operation (read: what you have to restore, and how fast). Getting lean and staying lean is muy importante.
There are a few things you can do to keep your primary storage clean and running efficiently. Storage Resource Management tools will give you a good view of how efficiently or inefficiently you’re using what you have. Classification can tell you who made it, when they made it and what kind of data it is. Retention policies can manage storage tiering and data removal. Put them all together and you’ve got yourself a lean tier 1 storage environment for your mission critical workloads. Less data on primary storage means less to protect and ultimately less to recover while the undead are trying to chew off your face.
Don’t forget to enable deduplication in your backup application. Dedupe will help you efficiently store and move what data you WILL have to restore. What would be redundant copies of data sitting on traditional backup storage become just the unique pieces of data needed to restore everything. Now duplicate those chunks of data and the backup catalog over the wire to your DR site and as soon as you scoot on in to your bunker, you’re ready to start restoring what MUST be restored. More on dedupe later.
Fewer physical servers make for faster and easier restores. How can we cut down on the physical assets we need to recover? Virtualization plays a huge part in this, but sophisticated traditional HA configurations can cut down on what is needed for clustering your remaining resource intensive workloads as well. Both of these also contribute to your overall “service mobility factor” (bam, just coined that one), or your ability to easily move critical services to different assets and locations.
Do any of you wanna have to restore the server that runs those “zombie head-shot turrets” planted outside the DR site? I don’t, I want that server to STAY UP! A great way to cut down on how much you’re restoring during a disaster is to restore less; automate that service coming back up on the flip side with High Availability.
Virtualization can be a key ingredient to successful HA. Don’t discount that layer of abstraction virtualization offers which makes service mobility a much easier thing to accomplish. And since moving virtual workloads around based on application availability is easy to do these days, it’s a no brainer (BRAINS!).
Traditional High Availability in physical environments through combinations of clustering and replication can be an extremely robust way to offer your largest, most resource intensive solutions a high degree of mobility. Plus, active/active configurations can help you cut down on the amount of physical machines needed to handle multi-tier applications, and multi-service clusters.
So don’t get bit because you had to restore the “zombie head-shot turret” application. Instead, give each turret a name and personality and bet on which one will finish off the most baddy’s. My money’s on Yosemite Sam. I might even risk sneaking out to paint a large red moustache on him and top him off with a 10 gallon cowboy hat. Hey, it’s the little things.
Control Your Own Destiny Data
I advocate controlling your own destiny by controlling your own data. I don’t want to depend on an employee from a tape vaulting company to survive and then bring me my tapes while he’s busy keeping a tally on the reanimated corpses he’s struck with the rot slathered grill on his box truck. So tape is out. I’m going disk all the way. Dedupe for the win!
Dedupe means I can store all the backup data in a way that makes duplicating it from site to site more efficient. If I’m already keeping track of what data needs to be immediately available for restore at my DR site, and what data needs to immediately be up on the other side, selecting only the necessary data to duplicate to my DR site is more efficient than just replicating it all, all the time.
Another aspect of controlling your environment is Virtualization. If you have fewer physical assets to depend on, and the OS’ are abstracted from the physical layer, recovering becomes easier and more straightforward. Standardizing on a platform makes training, knowledge sharing, and hiring easier. If you play your cards right, you can easily have all the knowledge and process details you need available in a recovery situation.
One option that’s crept up on us recently and is still fairly nebulous is Cloud. As service providers work to further define and implement their vision of a Cloud world, there could be some real benefits here. I’d hope that all of the Cloud provider’s datacenters would be fully zombie proof, similar to the first zombie proof house you can find pictures of if you search for it.
Some Cloud providers are offering storage services, where long term storage is done for you and your data can be made available through a portal or on demand. Depending on the method of transfer when you need your data back, this could be a good thing. Less data for you to worry about getting to your DR site, the data is still available over the wire if needed, and if you’re using this simply for long term retention you probably won’t even need to do a recall for a little ole zombie revolution. This is an exception to my previous directive around controlling your own data, but since it also fits under, “running lean” I think it’s worth considering.
I’ve identified a few tactics that may help your company recover during the inevitable rise of the flesh devouring undead and associated them with values that might be slightly different due to the aforementioned hypothetical, yet entirely probable circumstances. Virtualization, storage optimization, deduplication, HA; there are number of activities and configurations that can help you restore normal operation while dodging the living dead. While this may not contrast greatly from what you’re doing today, the motive likely does. So if your company revisits their Business Continuity and Disaster Recovery plans citing recent events, REMEMBER THE ZOMBIES!
Call to Action
Please let me know how you might protect/recover your data during the zombie apocalypse. I think I can speak for everyone when I say that this subject has not yet been fully explored.