Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Troubleshooting when an application hangs on an node within VCS

Created: 04 Jul 2014 • Updated: 04 Aug 2014 | 5 comments
This issue has been solved. See solution.

Hi, If an application hang on an node within vcs,i like to verify how I troubleshoot this issue. Should I use the stop script in main.cf to cleanly stop this application?Or kill -9 the application processes?Then start the group hagrp -online? thanks so much.

<<title edited by admin to add further descriptiveness>>

Operating Systems:

Comments 5 CommentsJump to latest comment

mikebounds's picture

If VCS detects the application is hung, then VCS will call a clean which will forcabily stop the application and then take further action depending on how you have configured VCS - so for instance if you have set RestartLimit on the resource type then VCS will restart the application, else in other configurations will failover the group to another system.

If VCS does NOT detect the application is hung, then if you have set RestartLimit on the resource type then you could kill -9 the application processes and then VCS will restart, but if RestartLimit is not set and you don't application to failover to another system, then you could offline using VCS (hares -offline or use GUI) an this will try to gracefully stop the application and of this doesn't work, VCS will call a clean.  Alternatively, you freeze service group (hagrp -freeze or use GUI) and kill -9 the application processes (freezing group means VCS will not take action when it sees process dies) and then restart application manually or using VCS.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
tanislavm's picture

Hi Mike,

Thanks so much.Could i start safely the application using hagrp -online,if the other group resources are online?Or better to stop whole group hagrp -offline and then start the group?

Gaurav Sangamnerkar's picture

Hi,

you would do "hagrp -online" if the group is either offline or partially online. If in case the application resource had issue & application resource had faulted, if the apps resource was not critical, rest of resource within the service groups will still be online. In this state, the service group would be in partially online state. If you are sure that application fault has been fixed, then you can trigger hagrp -online to start the application resources. VCS will auto detect that rest of resources in the service group are already online & only application resource needs to be restarted.

To answer your original question, I would agree with Mike on approach of freezing the service group, troubleshooting the application, unfreezing then.

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Gaurav singh's picture

@Mike.. I small doubt I have..Clould you please clear to me.

a) 1st case: If i take this case, What configurations required to failover the application to the second node.. If                       application hangs down in a node.

b) 2nd case: What restart limit need to put so that vcs restarts the apps?

Kinldy assist..

Thanks,

Gary

mikebounds's picture

If VCS detects the application issue, then if resource has Critical atribute of 1 (or any dependent resource is critical), then it will cause group to failover, else, it will not.

If VCS doe not detect the application issue, then if only need to kill application once to fix it, then you could use a RestartLimit of 1, but if you need to restart applications more than once, then you need to set RestartLimit accordingly.  But I would freeze service group (see first post) and then kill application, rather than using RestartLimit as this is what freezing a group is really designed for.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below