Tech Support Cracks The Case: Clearing the Problem Without Killing the Data
When just-in-time just isn't
A major equipment manufacturer was having trouble with a system that ran its just-in-time order chain. When a customer bought a piece of equipment, the purchase would trigger a message through the just-in-time ordering application, which would automatically order another one. However, if this system failed, orders could not be processed, and the company stood to lose millions.
The manufacturer used Veritas Cluster Server to manage the server cluster that hosted the application. Starting in February 2007, it began seeing failures where Volume Manager would flag some disks as failed. Because Cluster Server first identified the problem, the manufacturer thought the problem must be with this product, so its IT staff contacted Symantec support for assistance. Their contact was Ben, who has more than 10 years’ experience providing technical support, focusing on Veritas products since 2001.
A digital "No Trespassing" sign
The disks that had been marked as failed were actually functioning fine, Ben found. "It was a bit of a mystery," he recalls. With some exploration, he and his team found that the problem was that Volume Manager could not access the disks, because of a signature that marked them as reserved for a different host.
"There's an industry standard called SCSI III [for 'Small Computer System Interface'] that's intended to protect a disk from being grabbed by another host," Ben explains. In this system, a host server can put a "signature" on a specific disk that is the digital equivalent of a "No Trespassing" sign. It turned out that the disks in question had been previously used in another vendor's system and the signatures had never been removed. This was what was preventing Volume Manager from using the disks.
But what to do about it?
This solved the mystery, but not the problem. The previous vendor, which has its own proprietary platforms, did not have any means of removing the signature now that the disk was being used in a different operating system environment. To make matters worse, the backups made from the system in question had turned out to be unusable.
After months of struggle, with orders being held up at the company's busiest time of year, Ben wound up on a 2 a.m. call, along with tech support from the other vendor, as well as some of the manufacturer's senior executives. The previous vendor's support staff offered to remove the disk, move it to a system using its operating system, remove the troublesome signature, and then bring it back. But if they did that, they couldn't guarantee that the all-important data on the disk would remain intact. Losing that data was unacceptable to the customer, given the lack of usable backups.
"The other vendor said, 'Well, what else can we do?'" Ben recalls. "We said, 'Let's install VRTSfen.'" VRTSfen, the Veritas fencing module, comes with Cluster Server and is designed to resolve SCSI III issues. Because Cluster Server works on all major platforms, Ben knew it would clean the previous vendor's signature off the disk without affecting the data stored there. With few other options, the manufacturer agreed.
Five minutes later...
"Once they made the decision, the process only took five minutes," Ben says. "The disk came right back up, with no delays and no data lost." The next morning, he found himself on another call, this time with the customer's CIO, explaining what had happened and how the problem was fixed. The manufacturer has more work ahead: there are many other disks with similar signature problems, and cleaning them all will be an ongoing progress. But now, Ben says, "They are satisfied that they know how to fix it—which they do."