Unexpected disasters like 9/11 and Hurricane Katrina are large-scale examples of the need for federal, state, and local governments to have plans in place to keep information secure and available at all times. Every government agency should have a Continuity of Operations Plan (COOP) in place to ensure the agency stays up and running when planned or unplanned incidents occur. Having a plan in place is just one facet of the total solution. In accordance with Federal Preparedness Circular 65, an agency’s COOP plans must include regular testing of its implemented backup and recovery solutions. Even with the most robust disaster recovery infrastructure in place, an IT manager needs to know that the recovery plan will really work as it should when it’s needed.
The government’s information infrastructure is constantly subject to any number of threats:
- Data corruption or theft
- Storage changes (i.e., new volumes, mount points, etc.)
- Human error
- Failures: Component failure, application failure, power outage
- Configuration drift: a continual process that consistently and constantly degrades protection as trivial configurations change, get lost, or are forgotten (usernames, licensing, network paths).
Although most agencies have a Continuity of Operations Plan, regular testing is atypical, even though it’s necessary to ensure that IT security controls will respond appropriately in the event of a disaster. Without frequent disaster recovery testing it’s impossible to know if occasions like configuration drift changes will impact availability in times of real crisis.
According to Dave Jerome, a principal at Booz Allen Hamilton, agencies need to spend more time testing their existing COOP plans to guard against surprises in the midst of a real emergency. “Constantly make sure that people understand their responsibilities. Each time you test the plan you are going to find things that didn’t work exactly the way that you thought they would. COOP is a living plan that has to be updated on a periodic basis,” says Jerome.
So why aren’t more agencies testing their disaster recovery plans? Here are just a few of the common perceptions that IT managers have about disaster recovery testing:
- Disruptive to operations
- Not infallible
- Drain on resources
- Difficult to manage
Such perceptions of testing are understandable; but if you want to test how failovers work, you must actually pull the plug and force failover to occur. Agencies that do conduct tests generally do them just once a quarter, or once each year, which is not frequent enough to account for changes in the computing environment between tests.
Symantec has a solution that allows agencies to test disaster recovery scenarios without putting the production environment in jeopardy. Veritas Cluster Server (VCS) is the first and only enterprise class solution for testing and validating automatic disaster recovery. Only with VCS can your agency:
- Reduce disaster recovery costs
- Validate data servers and applications
- Have no impact on production
- Test anytime
VCS includes a Fire Drill feature that tests different scenarios to make sure the data remains intact. In virtual environments where server locations change frequently, Fire Drill helps monitor and track mobile servers, their configuration, and dependency links. There are two ways that Fire Drill works:
- Virtual Fire Drill — A virtual Fire Drill will check to see that configurations match up — catching any drift issues that have may have cropped up between tests, flagging all possible issues, and alerting the administrator immediately.
- Physical Fire Drill — Isolated from production, a physical fire drill is the gold standard for testing. It goes beyond the virtual setting and attempts to start a cloned version of all the applications and components at the disaster recovery site, and also alerts the administrator to any recovery problems. Once the test is done, VCS resets and starts synching again until it’s time for another test.
Government agencies must be more prepared for disaster than their private counterparts. Continuity of Operations plans, and just as important, frequent testing of the recovery procedures and technology in place, is a responsibility that the government must take seriously. No level of government can afford to find out that its critical data didn’t replicate in times of crisis. Testing a disaster recovery plan with Virtual Cluster Server will ensure that all systems will be there when they are most needed.