Failure to (re)import a single shared diskgroup during Cluster Volume Manager master takeover results in all shared diskgroups being disabled and node eviction from cluster

Article:TECH193261  |  Created: 2012-07-18  |  Updated: 2012-10-13  |  Article URL http://www.symantec.com/docs/TECH193261
Article Type
Technical Solution

Product(s)

Issue



During Cluster Volume Manager (CVM) master takeover, all shared diskgroups undergo a re-import. Volume Manager (VxVM) disables all shared diskgroups (dgdisable) when a single shared diskgroup fails to import. It also results in the corresponding node (new master) leaving the cluster.

NOTE: If the cause of the diskgroup import failure is common to all nodes in the cluster - this can result in cascading master takeover failures resulting in cluster-wide failure.
Ex:3 -node cluster : nodeA, nodeB & nodeC with shared diskgroups dgA & dgB
-nodeA is shutdown
-nodeB master takeover is attempted
-In nodeB, during re-import if import of shared diskgroup dgA fails with the below error messages then all the shared diskgroups will be disabled and nodeB is evicted.
vxvm:vxconfigd: [ID 702911 daemon.warning] V-5-1-16066 da_dg_reimport: disk <disk id> not found
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 dg_import_master: failed to import dg <diskgroup> , error 183
vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-0 master_takeover: error in disk group reimport: Disk for disk group not found, errno 0
 
- Like nodeB, noeC will encounter the same scenario and will be evicted resulting in total cluser outage.

Error



Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname1: Disabled by errors
Aug 11 22:38:35 hostname vxvm:vxconfigd: [ID 702911 daemon.error] V-5-1-7934 Disk group dgname2: Disabled by errors
 


Environment



PLATFORMS:  ALL : Solaris, HPUX, LINUX & AIX 

VxVM versions: 5.0.x ; 5.1.x & 6.0.x


Cause



The code path traversed during CVM master takeover exposed a bug that resulted in all shared diskgroups being disabled due to a single shared diskgroup's failure to import.


Solution



Code changes were made to disable only the specific shared diskgroup that experienced the failure. Re-import will skip the failed diskgroup and continue importing the rest of shared diskgroups thus preventing node eviction from the cluster.

WORKAROUND: NONE 

FIX INTEGRATED IN THE FOLLOWING PATCH(es)/VERSIONS
:

-6.0RP1HF1

-5.1SP1RP3 

 

Patch links for SFHA 5.1SP1RP3

AIX

https://sort.symantec.com/patch/detail/6806

AIX 7.1

https://sort.symantec.com/patch/detail/6807


Solaris SPARC

https://sort.symantec.com/patch/detail/6816

https://sort.symantec.com/patch/detail/6817


solaris x64

https://sort.symantec.com/patch/detail/6818

https://sort.symantec.com/patch/detail/6819


RHEL5 x86_64

https://sort.symantec.com/patch/detail/6808

https://sort.symantec.com/patch/detail/6809


RHEL6 x86_64

https://sort.symantec.com/patch/detail/6814

https://sort.symantec.com/patch/detail/6815


SLES10 x86_64

https://sort.symantec.com/patch/detail/6811


SLES11 x86_64

https://sort.symantec.com/patch/detail/6813
 


Supplemental Materials

Value2688308
Description

Do not disable other DGs when a re-import of a DG fails during master take-over




Article URL http://www.symantec.com/docs/TECH193261


Terms of use for this information are found in Legal Notices