Symptom of Veritas Cluster Server needing to be restarted: error: VCS WARNING V-16-1-10367 Dump already in progress
| Article:TECH178472 | | | Created: 2012-01-07 | | | Updated: 2012-01-10 | | | Article URL http://www.symantec.com/docs/TECH178472 |
| NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made. |
Problem
The symptom is that no entries were logging to the engine log of 1 or more nodes. Dumping the configuration would error. Rebooted nodes would not re-join cluster and hastop -local -force would hang. This required stopping had, unconfiguring gab and reforming the cluster.
Error
# haconf -dump
VCS WARNING V-16-1-10367 Dump already in progress
Rebooted node and it was seen in mode:
adelscott SysState CURRENT_DISCOVER_WAIT
(seen in hasys -state on another node of cluster and in engine log)
Environment
A failover cluster running Veritas Cluster Server (VCS) version 5.0MP1RP5 on Solaris 10 systems.
Similar symptoms of commands hanging and no logging taking place have been reported for other VCS versions and other supported Unix Operating Systems.
Cause
Unknown
Solution
1) Use 'ps -aef' to find process IDs (pid's) of the had and hashadow processes; repeat steps 1 and 2 for all nodes in the cluster.
# ps -aef|grep ha
root 4135 1 0 14:24:57 ? 0:00 /opt/VRTSvcs/bin/hashadow
root 4019 1 0 14:24:55 ? 0:08 /opt/VRTSvcs/bin/had
root 4283 1 0 14:25:05 ? 0:08 /opt/VRTSvcs/bin/Phantom/PhantomAgent -type Phantom
root 5527 2459 0 14:26:10 ? 0:02 /opt/VRTSsfmh/bin/hareg -all -group -resource -clus -sys -rclus -rsys -rgroup -
2) Kill both pid's on one command line to avoid them from restarting the other.
(this aborts the VCS engine but leaves production services running)
# kill 4135 4019
Use 'ps -aef|grep ha' to verify that both processes have been stopped.
3) Determine if I/O fencing is running and unconfigure on all nodes of the cluster if it exists.
# gabconfig -a GAB Port Memberships =============================================================== Port a gen 286101 membership 01 Port b gen 286105 membership 01 <=== Port h gen 286104 membership 01
( "01" in the last column indicates where this service is running)
# vxfenconfig -U
Run 'gabconfig -a' to validate that port b has been dropped from the output.
4) Unconfigure gab on all nodes of the cluster
# gabconfig -U
Run 'gabconfig -a' to validate that no ports are listed in the output.
5) Restart gab on all nodes.
# gabconfig -c -n<# of nodes>
After all nodes have been seeded, validate that gab has started on all nodes.
# gabconfig -a GAB Port Memberships =============================================================== Port a gen 286101 membership 01
6) Restart I/O fencing on all nodes if it was determined to be configured in step 3.
# vxfenconfig -c
After starting I/O fencing on all nodes, validate that it has started on all nodes.
# gabconfig -a GAB Port Memberships =============================================================== Port a gen 286101 membership 01 Port b gen 286109 membership 01
7) Restart had (VCS engine) on all nodes
# hastart
After starting had on all nodes, validate that it has started on all nodes.
# gabconfig -a GAB Port Memberships =============================================================== Port a gen 286101 membership 01 Port b gen 286109 membership 01 Port h gen 286106 membership 01
After the cluster and service groups has started and been procesed, use 'hastas -sum' to view a summary of the cluster status.
|
|
Article URL http://www.symantec.com/docs/TECH178472
Terms of use for this information are found in Legal Notices









Thank you.