Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Proactive Checking of IO Fencing Keys

Updated: 28 Jun 2010 | 1 comment
Gaurav Sangamnerkar's picture
2 Agree
0 Disagree
+2 2 Votes
Login to vote
Status: Implemented

In all versions of  Storage Foundation suite, Symantec provides a feature of IOFencing to prevent Cluster Split Brain situation & prevent data corruption.

As per design, IOFencing is a passive module i.e it doesn't react live unless there is any GAB port membership change. In case cluster keeps running fine, there is no check which is done by IOFencing module.

It would be great if IOFencing module is made intelligent so that it performs frequent checks whether the coordinator disks have appropriate registration keys & the data disks have appropriate registration & reservation keys.

There are many occurances faced in real life scenario when disks have stale keys it is realized much later once issue occurs. Very common issue we face on our environment is diskgroup not getting imported with an error "diskgroup has no valid configuration copies", the result we find is disks are left with stale keys because of which vxvm is not able to access the private region of the disks. Some times the problem could be much severe causing huge impact to business.

If IOFencing module is made intelligent, so that it keeps a track of following:

a) Registration on coordinator disks is correct.
b) Registration & reservation on data disks is correct.
c) If any unwanted keys are detected, then it can take an action (ofcourse with user intervention) to delete the unwanted keys.
d) Keeps track or logging in some log file.

I am sure this can prevent many issues.

Thanks

Gaurav

Comments

Jawahar Mohan's picture
27
Apr
2010
0 Votes 0
Login to vote

feature added in 5.1 CPS

I think 5.1 CPS supports this feature. The CP agent alerts on loss of CPS registrations and allows re-registering the lost registration due to admin error (or some other problem) without stopping cluster.