Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

split brain

Created: 03 Jun 2014 • Updated: 03 Jun 2014 | 3 comments
This issue has been solved. See solution.

Hi,

I like to see your comment on this matter.

I have 2 vcs node and suddenly the 2 private network path fail,so i am in the split brain scenario.I have an service(application) runs on node A and other application runs on node B.

 

Please what are the steps to perform in this scenario in order to have all the users running the aplication safely.

 

The steps would be like below?

- kill all users sessions on node A and node B

- shutdown VCS on every node and leave the applications running(on node A and on node B).

 

- workaround->edit llttab so the vcs will use link-lowpriv nic or heartbeat disk?then start vcs on both nodes?

 

 

 

 

thanks.

 

Operating Systems:
Discussion Filed Under:

Comments 3 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hello,

Ideally having a low pri link is an additional security to prevent split brain however if it has already happened, things are different..

1. If you have not implemented IOFencing, high chances of having data corruption as node A will think that node B has gone while node B will think that node A has gone & thus both the nodes will try to take a full ownership.

The recommendation in this situation is to imediately & safely shutdown one node & keep working with one node with all the groups imported.

Ideal recommendation is to always use IOFencing in order to protect data corruption from split brain situations.

You can also tune GAB parameters  but again thats a workaround

 

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
Marianne's picture

The most important check of the heartbeats/interlinks is to ensure that completely different hardware/NICs/switches/network paths are used for the heartbeats.
If ANY individual component in the network path fails, it should not lead to loss of both heartbeats.
This kind of test is normally done on a new cluster before it goes into production.

Any kind of common infrastructure will be a SPOF (single point of failure). This is a bigger risk to your data than not having a cluster.

I have seen 2 split brain scenarios at 2 different sites over the years. Not pretty....

After recovering data from tape, both customers implemented I/O fencing.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

tanislavm's picture

Hi Marianne,

Your reply is very useful.thank you.