Video Screencast Help
Storage & Clustering Community Blog

Should I Worry about Data Corruption? - Unfortunately Yes......

Created: 20 Feb 2013 • Updated: 11 Jun 2014
bpascua's picture
+1 1 Vote
Login to vote

Recently one of my customers had a series of outages in the Communications between their buildings. The upshot of this is because of the way they had deployed their clusters they weren’t protected against their cluster nodes losing communications between each other. I have seen mixed experiences from my customers in terms of split brain issues (split brain is when all nodes in a cluster begin writing in an uncoordinated fashion to shared storage as they each believe they are the last node in the cluster)

I have seen customers running Campus Clusters with no split brain protection and have yet to see any problems. I have also seen other customers who go belt and braces using IO fencing which is built into VERITAS cluster server. Some have issues due to the way they handle the IO fencing devices as they are often not understood. So what's it all about then? it’s pretty simple.

For a cluster to work there needs to be a communication system between the nodes in the cluster to establish which systems are up and which are down. VERITAS Cluster Server uses the concept of heartbeats. These are isolated channels which once a second pass a message  between nodes saying "i am alive". Normally we state 2 heartbeats plus a low priority heartbeat. This is a heartbeat which  uses a public interface only when the real fulltime heartbeats fail. In this way we can prevent the nodes from any arbitration behavior by using a public interface temporarily.

Lets say you have a 2 node cluster and I walk into your data Centre and yank out your heartbeat cables between your nodes. Suddenly after a specified interval of checking,  each node comes to the conclusion that it is the last node in the cluster.  It will then attempt to force import the storage. Now consider this could be a genuine failure of one of the nodes in the cluster. In that scenario  we want the remaining node to import the storage that was being used before the crash and start our applications. (otherwise what's the point of high availability) In our scenario where we have actually not lost any nodes simply the communications between systems both systems will import the storage and begin writing to the filesystems. Time to get your backup tapes out or resync from a hardware replica from this morning. This is what we call a split brain.

Symantec do have some good mechanisms to protect you from this. The first is a type of membership arbitrations is called IO fencing. This is  leveraging SCSI3 reservations from the hardware storage subsystems itself. The storage subsystem can forcibly stop a specific system doing IO to a device. It involves having 3 coordination points (vote disks) when the cluster starts each node joining the cluster registers keys on these vote disks. Now in the scenario above where all communication is lost between cluster nodes an arbitration race begins. Each node in the cluster will race to gain control of the vote disks, which ever loses the race by getting the minority of the vote disks will be fenced out of the cluster and sent a panic request.

So we are forcibly crashing the race loser to avoid it writing to the shared disks. IO fencing is bullet proof and will also block IO from any 3rd party hosts mistakenly gaining access to the shared disks. Also if a system has hung there is the possibility when it comes out of its hung state that it could flush IO down to the shared devices causing corruption.  SCSI3 reservations and IO fencing stop this.  This is the recommended way to configure clusters, it does come at the price of needing 3 vote disks for each cluster. Additionally in virtualised environments SCSI3 reservations are often not supported so this becomes a little irrelevant.

Symantec also have another clever arbitration method known as Coordination Point Server(CPS). It offers a solution for customers wishing to vastly reduce the possibility of split brain without needing the vote disks and scsi3.  Coordination point servers are used to independently judge which nodes are up in a cluster. So as with the vote disks three are needed to judge fairly. Three coordination point servers are required in the environment. These are effectively three single node VCS clusters which sit idle until there is a dispute. The difference here is that these three servers can arbitrate many hundreds of clusters as they are simply contacting the nodes over IP to see if they are alive. In my example above when both systems believe they are the last remaining  node the following takes place. The three coordination point servers attempt to contact each system in the cluster, which ever system gets the most votes is the winner and stays up. The losing node is send a kill command and crashes. Thus this is split brain protection by taking out the other contenders who might want to write to the storage.

This raises an interesting scenario. In a two node cluster if I have a production server and test server acting as a standby node. If there is a loss of communications between the two and the arbitration process starts using the coordination points server, what happens if your test server wins the race? you might have a red faced service manager shouting at you. The good news is from VCS 6.0 onwards there is the concept of preferred fencing. This simply means you can weight a race to choses either a system or service group. This way in the loss of communication scenario you can ensure your test system is taken out of the equation instead of your production server.

So which is better? it's horses for courses I'm afraid. SCSI3 offers bullet proof protection, of that there is no question. But it comes at the price of needing many vote disks and SCSI3 compliant storage. Coordination Point Server offers a best efforts approach to arbitration and the effort involved in terms of hardware and effort is almost negligible. But there will be corner cases as mentioned where you could face corruption if a hung system came back before it was killed and was able to flush it's data buffers down to disk.

If the data Centre  was mine I would risk the second approach with the CPS servers. It's much better than having no arbitration and is a doddle to setup. Of course if I stared seeing data corruption I could change my mind……and job.

Cordination Point Server is availalable from VCS 5.1SP1 onwards.