In my previous blog I outlined the topic of IT service assurance and the five key things to consider if you are looking to transform your IT department or simply operate it more efficiently with existing assets.
In this blog I will focus on the topic of IT continuity.
With hurricane Sandy recently tearing through the North East of the United States, businesses have been reaching once again for their business continuity plans (BCP). The broad topic of business continuity management (BCM) covers all facets of how to minimise risks and ensure continual delivery of business services, IT being one of the key elements.
Given the devastation a hurricane and other natural disaster can cause shouldn’t all businesses build BCM plans to accommodate these situations? Well yes, in the ideal world businesses would minimise or remove all risks - if money was no object. However, in reality budgets are limited and there’s often a reticence to sure up defences for something that may never happen.
In larger organisations I commonly see alignment with best practice frameworks such as BS25999 (recently became the ISO22301 standard). In smaller organisations the approach is often more free form, based on perceived priorities that are key to the running the business. Whatever the approach being taken, my discussions with customers focus on the topic of provable and appropriate IT continuity plans as part of their BCM strategy.
From my experience, by far the largest proportion of IT failures come from accidental human and application errors. With this mind I always talk to clients about the following considerations.
Review or create a risk register
First of all understand what your business risk posture is in relation to failures in IT services. This requires consultation with the business owners before any IT solutions should be developed. Put the business services and their reliance on IT into manageable categories with similar importance and recovery objectives. You now have the basis for IT to create appropriate IT continuity solutions without over or under investing.
Establish mature operational disciplines
In general, prevention is better than cure. In other words, operate your production environment with mature operational disciplines and reduce complexity, where you can, through standardising infrastructure and software. Maintaining tight control of the production environment should help minimise the human and application errors. Of course this does not remove the need for IT continuity services (as it is impossible to remove all risk) but such an approach should provide you with better control of known risk.
Frequent and realistic testing
I’m sure that anyone who has been involved in a disaster recovery (DR) test will appreciate the trepidation before the event. Will it work? How much of my weekend will this take? Will we have to come back next weekend? What will we have to change as a result and what will be needed to do that? Full DR testing is an involved process so why not maximise your confidence when these big days are coming up?
Regular simulations that test process and data integrity without impacting the production environment will build this confidence. Factor this consideration into the equation when designing IT continuity solutions to maximise your confidence and demonstrate provable DR.
Understand interactions between application tiers
Business services that rely on IT are often made up of multiple application tiers that interact together to provide the overall service.
Server virtualisation has allowed application workloads to be seamlessly relocated to other physical assets. Having a clear understanding of this is essential to recover failed components of that service. Trying to manually track these interactions is time consuming and error prone. However, this is where automation can play a big part. Consider how you visualise and track the relationships between applications tiers and how you can automate recovery of not only the failed component but also any knock-on effect that has had up or down the stack.
For tier 1 applications with short recovery time objectives (RTO) this is essential to meet those service level agreements (SLAs).
Review SLAs for commoditised and converged architectures
Moving towards commodity hardware and open source software has obvious cost benefits. Similarly, converged architectures (pre-defined vertically aligned hardware stacks) can offer improved simplicity and standardisation.
However, from my experience both need to be reviewed from an IT continuity perspective to ensure they meet the benchmark expected from previous implementations. Also how to do they integrate with existing assets and operational processes already in place? Do you need to create new processes and tools to ensure IT continuity of the services running on these platforms?
Remove the weakest link in the chain
The final consideration is one that can often be over looked by IT continuity professionals, simply because it doesn’t naturally fit the reactive recovery of services.
I refer to an old customer who was focused on making sure they had a DR strategy in place. All the focus was put on traditional techniques like backup and recovery, replication, high availability and removing single points of failure from the infrastructure. All of these are key tools to building a DR recovery strategy.
What this customer had overlooked, however, was the need to address another glaring hole in the IT Strategy – information security. What happened? After implementing the very well defined DR strategy their network was attacked with a Denial of Service (DoS) attack, which rendered their customer facing websites unavailable for two days, losing the organisation lots of online business. The moral of the story is that the weakest link in the chain needs to be identified, and it might not always be where you first look.
If you are interested in how Symantec can help your organisation to more closely align business and IT via a short facilitated workshop please contact the author directly (firstname.lastname@example.org).