Video Screencast Help
Give us your opinion and win with Symantec! Please help us by taking this survey to tell us about your experience with Symantec Connect, so that we can continue to grow and improve.  Take the survey.

Jeopardy and split brain condition

Created: 27 Jun 2014 | 2 comments

Hi,

What needs to be done to resolve the issues of,

1) Jeopardy state

2) Split brain 

Thanks,

Gary

Operating Systems:

Comments 2 CommentsJump to latest comment

mikebounds's picture

Jeopardy is when you only have 1 remaining heartbeat link remaining:

  • To prevent, have more than 2 heartbeats - example have 2 private heartbeats and one low-pri on the pubic network
  • To resolve jeopary - fix broken heartbeats so you have at least 2 heartbeats working

Split brain is when your cluster partitions so you have 2 (or more) sub-clusters that are not communicating with each other (example in a 2-node cluster you could have a cluster partition of 2 single node clusters - i.e the 2 nodes have lost communcation with each other on ALL heartbeats) - in this sceanrio you risk data corruption as the subclusters make try to write to the data at the same time.

  • To prevent, make sure heartbeats are truely independent and share no common components so that no physical failure of a single component can cause both hearbeats to fail at the same time. You should also put heartbeats on separate VLANs so that if you make an error when modifying a VLAN you don't effect both heartbeats (administration of the networks is the most common reason I have seen where customers loose multiple heartbeats at the same time).  It is extremely unlikely that 2 independent hardware components will fail at the time (well actually fail with 15 seconds of each other), so having more than 2 heartbeats, does not greatly effect reducing the chances of split-brain, unless your heartbeats are not truely independent.
    You can also configure fencing, but this does not actually prevent split-brain - it takes action in a split-brain scenario to take one subcluster down so you don't get data corruption.
  • To resolve split-brain, fix one or more heartbeats so the nodes start communicating again.

Note there can be a conflict between protecting against Jeopardy and split-brain because to have more than 2 heartbeats means you need more independent hardware which is not often available.  For example, the public network is usually configured with 2 NICs (bonding, teaming, port aggregation etc) which requires 2 independent switches and the 2 private heartbeats need to use independent switches also so if you configure a low-pri heartbeat on the public network you should use 4 independent switches and 4 independent NICS, but I often see customers configuring low-pri heartbeats when everything is on 2 or 3 switches or/and less than 2 or 3 dual/quad NICs - this means if you loose 1 dual/quad NIC or switch then you have 2 heartbeats remaining (one private and one low-pri) which are NOT indepentent as loosing another dual/quad NIC or switch will cause both remaining heartbeats to fail at the same time, causing split-brain.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Gaurav singh's picture

At the time of installation, it asked for creating a virtual LLT link from the NIC.What is use of that?SHall we create it?

Thanks,

Gary