Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Nodes in ADMIN_WAIT_STATE & LEAVING after reboot

Created: 15 Aug 2013 • Updated: 25 Sep 2013 | 7 comments
This issue has been solved. See solution.

Hi

I am having a  little knwlodege on vcs , after a reboot of a server i found that node states as ADMIN_WAIT_STATE and other node state as LEAVING this is a 2-NODE CLUSTER

Is this occured due to incorrect configuration file? correct me if iam wrong

please explain me in which situations we find these errors

in which situations we find STALE file/error in MAIN.CF file

Thanks and Regards

 

 

Discussion Filed Under:

Comments 7 CommentsJump to latest comment

g_lee's picture

moved to new discussion as this is an entirely new/different/unrelated issue for a different user

If this post has helped you, please vote or mark as solution

g_lee's picture

You haven't mentioned OS version, VCS version, or any other pertinent information that might be useful - here's some generic documentation that might assist, if you need more detailed information you will need to provide more information.

- TECH6072: After a reboot, a node in a VERITAS Cluster Server (VCS) environment is in an ADMIN_WAIT state or in a STALE_ADMIN_WAIT state

http://www.symantec.com/business/support/index?pag...

- Veritas Cluster Server 5.1 (Solaris) Administrator's Guide -> Appendixes -> Cluster and system states -> System States

https://sort.symantec.com/public/documents/vcs/5.1...

extract:
----------
State: ADMIN_WAIT
Definition:
The running configuration was lost. A system transitions into this state for the following reasons:
• The last system in the running configuration leaves the cluster before another system takes a snapshot of its configuration and transitions to the running state.
• A system in local_build state tries to build the configuration from disk and receives an unexpected error from hacf indicating the configuration is invalid.

[...]

State: LEAVING
Definition:
The system is leaving the cluster gracefully. When the agents have been stopped, and when the current configuration is written to disk, the system transitions to exiting. 
----------

re: stale file - this was removed from VCS 5.0 onwards, see here:

https://www-secure.symantec.com/connect/forums/how...

If this post has helped you, please vote or mark as solution

kittu_pandu's picture

Hi Lee

It was solaris-10 and VCS version is 5.1

so can you please explain how to make the configuration file correct

is hacf is a daemon?

when we are replacing a NIC card we will be updating the MULTI_NIC service group with new nic card so we will be modifying the MAIN.CF file or will run a command to make changes effect , after modifying we will stop the cluster and will do a reboot so during thr reboot if we find the STATES: ADMIN_WAIT , LEAVING then how to resolve

will it be effected due to any faulted resources? if and so did we modify any attributes?

please let me know

Thanks and Regards

 

 

Daniel Matheus's picture

You can edit the MultiNIC resource online using hares command, please see this TN for details http://www.symantec.com/business/support/index?pag....

Please note that this is for MultiNICA resources.

 

Leaving just means that this node is about to gracefully leave the cluster, problably because hastop or a reboot/shutdown was initiated.

During this time it has to stop all servicegroups running on this node and stop the agents before VCS will be able to leave the cluster. Depending on your applications and other resources such as mount points, volumes etc this can take quite a while sometimes.

You should check using hastatus command whether resources are being shutdown.

 

hacf is a program which checks the main.cf for syntax errors, not a daemon.

For any more help we would need the engine_A.log from when you try to start VCS.

 

Thanks,
Dan

 

If this post has helped you, please vote or mark as solution

kittu_pandu's picture

Hi Daniel

After the nodes went into ADMIN_WAIT and LEAVING state i tried to run the commands to bring service groups to online but it was not responding as it was not in running state 

When we are stopping the VCS if some resources are in failed state should we resolve them right? orelse if we stp VCS without troubleshooting then we will face issues like Nodes wil be not in running state  and how to make the resources to bring them online ? will it be sorted by making changes to attributes of the resources

can you please explain the important attributes of resources/service group

How exactly hacf program works , to run this command should we cd to /etc/VRTSvcs/conf/config ? if we run here how exactly the output we can see and what it will do

If we find Stale file in /etc/VRTSvcs/conf/config will it cause the states of the Nodes ? so should we remove the Stale file? when can we see the Stale file what does it mean ?

What can we do from engine_A.log file  ? what does it contain in which cases we go with this file where we can find ?

Thanks

Kittu 

 

Daniel Matheus's picture

Hi Kittu,

You should consider getting proper training on VCS to fully understand the product.

Here on the forum we can give you hints with specific issues/questions, but this does not replace a training.

 

HACF:

You just point hacf to the folder which contains the main.cf and types.cf files i.e.

#hacf -verify /etc/VRTSvcs/conf/config

 

When we are stopping the VCS if some resources are in failed state should we resolve them right? orelse if we stp VCS without troubleshooting then we will face issues like Nodes wil be not in running state  and how to make the resources to bring them online ? will it be sorted by making changes to attributes of the resources

Please see the Bundled agent guide for details about attributes of resources (chose the one for your version of VCS), mandatory atributes need to be set, optional can be set

https://sort.symantec.com/search?q=bundled%20agent...

 

VCS logs every action and output in the engine_A.log file, so if your resources are in failed state this is a good place to start troubleshooting:

#/var/VRTSvcs/log/engine_A.log

 

After the nodes went into ADMIN_WAIT and LEAVING state i tried to run the commands to bring service groups to online but it was not responding as it was not in running state

If a node is leaving you cannot bring any resource online on it, if a 5.x or later node is in admin wait there is most probably an issue with your configuration. To bring any resource online on a node the node needs to be in running state.

 

You can check that with:

#hastatus -sum

#hasys -state <nodename>

 

 

If this post has helped you, please vote or mark as solution

SOLUTION
Alok Sontakke's picture

STALE_ADMIN_WAIT indicates that the on-disk configuration file (main.cf) is invalid and there is no other node with a valid configuration.

ADMIN_WAIT indicates that the local configuration file is valid, however another node which possibly has a more recent version of the configuration cannot provide the configuration snapshot - in this case because the node is in LEAVING state.

This is not a VCS configuration issue. Once, the other node has rebooted, it will go into ADMIN_WAIT state also.

You can use the hasys -force command to bootstrap the cluster configuration from a node which has the recent configuration.