Video Screencast Help

Jeopardy and split brain condition

Created: 27 Jun 2014 | 2 comments


What needs to be done to resolve the issues of,

1) Jeopardy state

2) Split brain 



Operating Systems:

Comments 2 CommentsJump to latest comment

mikebounds's picture

Jeopardy is when you only have 1 remaining heartbeat link remaining:

  • To prevent, have more than 2 heartbeats - example have 2 private heartbeats and one low-pri on the pubic network
  • To resolve jeopary - fix broken heartbeats so you have at least 2 heartbeats working

Split brain is when your cluster partitions so you have 2 (or more) sub-clusters that are not communicating with each other (example in a 2-node cluster you could have a cluster partition of 2 single node clusters - i.e the 2 nodes have lost communcation with each other on ALL heartbeats) - in this sceanrio you risk data corruption as the subclusters make try to write to the data at the same time.

  • To prevent, make sure heartbeats are truely independent and share no common components so that no physical failure of a single component can cause both hearbeats to fail at the same time. You should also put heartbeats on separate VLANs so that if you make an error when modifying a VLAN you don't effect both heartbeats (administration of the networks is the most common reason I have seen where customers loose multiple heartbeats at the same time).  It is extremely unlikely that 2 independent hardware components will fail at the time (well actually fail with 15 seconds of each other), so having more than 2 heartbeats, does not greatly effect reducing the chances of split-brain, unless your heartbeats are not truely independent.
    You can also configure fencing, but this does not actually prevent split-brain - it takes action in a split-brain scenario to take one subcluster down so you don't get data corruption.
  • To resolve split-brain, fix one or more heartbeats so the nodes start communicating again.

Note there can be a conflict between protecting against Jeopardy and split-brain because to have more than 2 heartbeats means you need more independent hardware which is not often available.  For example, the public network is usually configured with 2 NICs (bonding, teaming, port aggregation etc) which requires 2 independent switches and the 2 private heartbeats need to use independent switches also so if you configure a low-pri heartbeat on the public network you should use 4 independent switches and 4 independent NICS, but I often see customers configuring low-pri heartbeats when everything is on 2 or 3 switches or/and less than 2 or 3 dual/quad NICs - this means if you loose 1 dual/quad NIC or switch then you have 2 heartbeats remaining (one private and one low-pri) which are NOT indepentent as loosing another dual/quad NIC or switch will cause both remaining heartbeats to fail at the same time, causing split-brain.


UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

Gaurav singh's picture

At the time of installation, it asked for creating a virtual LLT link from the NIC.What is use of that?SHall we create it?