Symptom of Veritas Cluster Server needing to be restarted: error: VCS WARNING V-16-1-10367 Dump already in progress

Article:TECH178472  |  Created: 2012-01-06  |  Updated: 2012-01-10  |  Article URL http://www.symantec.com/docs/TECH178472
NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made.
Article Type
Technical Solution

Product(s)

Environment

Subject

Issue



The symptom is that no entries were logging to the engine log of 1 or more nodes. Dumping the configuration would error. Rebooted nodes would not re-join cluster and hastop -local -force would hang.  This required stopping had, unconfiguring gab and reforming the cluster.


Error



# haconf -dump
VCS WARNING V-16-1-10367 Dump already in progress

 

Rebooted node and it was seen in mode:
 

adelscott  SysState           CURRENT_DISCOVER_WAIT

(seen in hasys -state on another node of cluster and in engine log)


Environment



A failover cluster running Veritas Cluster Server (VCS) version 5.0MP1RP5 on Solaris 10 systems.

 

Similar symptoms of commands hanging and no logging taking place have been reported for other VCS versions and other supported Unix Operating Systems.


Cause



Unknown


Solution



1)  Use 'ps -aef' to find process IDs (pid's) of the had and hashadow processes; repeat steps 1 and 2 for all nodes in the cluster.

 

# ps -aef|grep ha
    root  4135     1   0 14:24:57 ?      0:00 /opt/VRTSvcs/bin/hashadow
    root  4019     1   0 14:24:55 ?      0:08 /opt/VRTSvcs/bin/had
    root  4283     1   0 14:25:05 ?      0:08 /opt/VRTSvcs/bin/Phantom/PhantomAgent -type Phantom
    root  5527  2459   0 14:26:10 ?      0:02 /opt/VRTSsfmh/bin/hareg -all -group -resource -clus -sys -rclus -rsys -rgroup -

2)  Kill both pid's on one command line to avoid them from restarting the other.

(this aborts the VCS engine but leaves production services running)

 

# kill 4135 4019

 

Use 'ps -aef|grep ha' to verify that both processes have been stopped.

 

3)  Determine if I/O fencing is running and unconfigure on all nodes of the cluster if it exists.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286105 membership 01    <===
Port h gen   286104 membership 01

( "01" in the last column indicates where this service is running)

 

# vxfenconfig -U

 

Run 'gabconfig -a' to validate that port b has been dropped from the output.

 

4)  Unconfigure gab on all nodes of the cluster

 

# gabconfig -U

 

Run 'gabconfig -a' to validate that no ports are listed in the output.

 

5)  Restart gab on all nodes.

 

# gabconfig -c -n<# of nodes>

 

After all nodes have been seeded, validate that gab has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01

 

6)  Restart I/O fencing on all nodes if it was determined to be configured in step 3.

 

# vxfenconfig -c

 

After starting I/O fencing on all nodes, validate that it has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286109 membership 01

 

7)  Restart had (VCS engine) on all nodes

 

# hastart

 

After starting had on all nodes, validate that it has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286109 membership 01
Port h gen   286106 membership 01

 

After the cluster and service groups has started and been procesed, use 'hastas -sum' to view a summary of the cluster status.




Article URL http://www.symantec.com/docs/TECH178472


Terms of use for this information are found in Legal Notices