One node won't join after master panic in a four node VERITAS Cluster Volume Manager cluster
| Article:TECH16284 | | | Created: 2001-01-21 | | | Updated: 2004-01-21 | | | Article URL http://www.symantec.com/docs/TECH16284 |
Problem
One node won't join after master panic in a four node VERITAS Cluster Volume Manager cluster
Error
vxvm:vxconfigd: NOTICE: CVM_VOLD_CHANGE command received
vxvm:vxconfigd: NOTICE: establishing cluster
vxvm:vxconfigd: ERROR: -1 returned from volcvm_establish
vxvm:vxconfigd: ERROR: cluster_establish: error 230
vxvm:vxconfigd: ERROR: kernel_fail_join() : master_takeover is 0
Solution
This problem happens under the following
circumstances:
1. A node leaves while some shared mirrored volumes
(with Dirty Region Log - DRL) were "dirty".
2. All nodes remaining in the cluster set recovery bit
for the node that left.
3. The master node starts to recover the volumes. This
stage consists of several steps: reading the DRL map of the leaver node; putting
in the accumulator map;
clearing the DRL map for the node in (1); starting the
recovery on the volume.
4. At this time, if the recovery is complete AND the
master dies, and none of the other remaining nodes had the shared volumes
dirty(including the master), then NO recovery happens on the new master.
However, the DRL bit map on this new master has the recovery bit still set for
the node in step 1).
5. The node which left in step 1. will not be able to
join.
Here is the possible workaround:
Make the volume(s) marked dirty in (1) and recovered in
(3) dirty again (i.e write to them from any node). After that, have that node
leave the cluster. This will cause another recovery to happen, and the recovery
bit of ALL nodes which had this volume dirty will be cleaned. Now any node can
join the cluster.









Thank you.