Disaster Recovery of SBS 2008 Failed
I have a client running SBS2008 SP2 on a 2 year old server, with BE 12.5 SP4 installed, backing up to Quantum DLT tape drive.
Last week, 2 drives in a RAID 5 scenario went down, lost the array. We replaced the drives, recreated the array, and followed Symantec document TECH87893 to recover the server, followed the document to the letter.
At the point of rebooting after restoring the system partition, system state, and shadow copy components, the server blue screened with a stop code of x07b. We followed the procedure another 2 times, changing the raid driver once, and changing the partition size once. Again, the server blue screend with the same stop code, no driver information given.
I opened a ticket with Symantec support, who basically told me I was doing everything correct and I should call Microsoft to see if they can help. I opened a ticket with Microsoft who for 4 hours attempted to change registry settings to disable services and devices from booting up, with no success. Microsoft's response was that in most cases such as these, it is best to build the server from scratch and basically deploy a new active directory box.
At this point, I had reinstalled the SBS server 4 times, talked to support for a total of 7 hours, and the client had been down for 3 days. My choices were limited to one, rebuild the server, restore data, take server on site, and eat crow with the client. 2 techs spent most of the day onsite joining 8 workstations to new domain, copying profiles, and rebuilding scripts and security. There were 2 things that were in our favor, the data was recoverable, and there were only 8 workstations.
But what if this was not the case? What if there were 30 workstations, remote users, what if Exchange was configured in a way that users did not have a local OST to export out and reimport? We have many small business clients, with different versions of SBS and different versions of BE attempting to prevent scenarios such as these. These clients look to us for advise on how to quickly recover from these failures and provide a comfort level that we can protect them from a lengthy downtime. We are now faced with a huge dilema, have we been advising our clients incorrectly?
We have read the documentation on different backup product solutions from Symantec, and deployed based on 30 years of experience in the industry. Yet, in all this time, I have never read a document that says "This is how you should back up your particular environment so you can recover in your acceptable time frame". I know this is a reach, but is there a backup guru out there, somewhere in Symantec land, that can give us some sort of direction as to whether we are doing this right or wrong? We believe that if change is needed, we will deploy the change at our clients' sites.
Thanks for letting me vent, I'd appreciate any comments.