One node won't join after master panic in a four node VERITAS Cluster Volume Manager cluster

Article:TECH16284  |  Created: 2001-01-21  |  Updated: 2004-01-21  |  Article URL http://www.symantec.com/docs/TECH16284
Article Type
Technical Solution

Product(s)

Environment

Problem



One node won't join after master panic in a four node VERITAS Cluster Volume Manager cluster

Error



vxvm:vxconfigd: NOTICE: CVM_VOLD_CHANGE command received
vxvm:vxconfigd: NOTICE: establishing cluster
vxvm:vxconfigd: ERROR: -1 returned from volcvm_establish
vxvm:vxconfigd: ERROR: cluster_establish: error 230
vxvm:vxconfigd: ERROR: kernel_fail_join() : master_takeover is 0

Solution



This problem happens under the following circumstances:

1. A node leaves while some shared mirrored volumes (with Dirty Region Log - DRL) were "dirty".

2. All nodes remaining in the cluster set recovery bit for the node that left.

3. The master node starts to recover the volumes. This stage consists of several steps: reading the DRL map of the leaver node; putting in the accumulator map;
clearing the DRL map for the node in (1); starting the recovery on the volume.

4. At this time, if the recovery is complete AND the master dies, and none of the other remaining nodes had the shared volumes dirty(including the master), then NO recovery happens on the new master. However, the DRL bit map on this new master has the recovery bit still set for the node in step 1).

5. The node which left in step 1. will not be able to join.

Here is the possible workaround:

Make the volume(s) marked dirty in (1) and recovered in (3) dirty again (i.e write to them from any node). After that, have that node leave the cluster. This will cause another recovery to happen, and the recovery bit of ALL nodes which had this volume dirty will be cleaned. Now any node can join the cluster.




Legacy ID



240443


Article URL http://www.symantec.com/docs/TECH16284


Terms of use for this information are found in Legal Notices