Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Resource Fault question

Created: 28 Jan 2014 • Updated: 07 Feb 2014 | 7 comments
mokkan's picture
This issue has been solved. See solution.

When Resource goes offline unexpectedly,  agent monitor the resource and run clean entry point to bring resource offline and make it into Faulted state.  My quesitons is before it brings it into Faulted state, can we restrat the resouce?

Operating Systems:
Discussion Filed Under:

Comments 7 CommentsJump to latest comment

Gaurav Sangamnerkar's picture

Hi,

You very well can but would you be able to get the specific time window to restart the resource ? Resource offline, clean will happen in within  of minutes (depending on how MonitorInterval, MonitorTimeout, RestartLimit) is set.

If a resource is in transitioning state (onlining or offlining), you can flush (hagrp -flush) the service group so that tranistioning of resources stops & then you can take the manual action.

G

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

SOLUTION
Marianne's picture

I agree with Gaurav.

Increase RestartLimit. The default for most resource types is 0.

You may want to read through this section in VCS Admin Guide:

Controlling VCS behavior at the resource level

Extract:

About the RestartLimit attribute
The RestartLimit attribute defines whether VCS attempts to restart a failed
resource before informing the engine of the fault.
If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function. However, setting the ManageFaults attribute to NONE prevents
the Clean function from being called and prevents the Online function from being
retried.
 
(VCS Admin Guide and other manuals can be found here: http://sort.symantec.com/documents )

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
mikebounds's picture

I'm not sure if you are asking if "you" can start it or if "VCS" can restart it:

If "you" restart resource before VCS detects it is down, then resource will not be marked as faulted, but if you are intentionally restarting, you should "freeze" service group so VCS does not interfere with your restart

If RestartLimit is set to greather than zero, then VCS will restart resource and will not mark as faulted unless all restarts fail.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

SOLUTION
mokkan's picture

Thank you very much for all of your input. Sorry for asking stupid basic queston.

When Resource goes offline unexpectedly, agent call clean function to make offline. If we set RestartLimit non zero value. Which one will be caled first? Clean action or restart?  What I am trying to understand is that after agent make faulted, then agent call Restart?

Setu Gupta's picture

First the agent will call clean entry point to make ensure that the resource is completely offline. After that the agent will call the online entry point to restart the resource as per the RestartLimit attribute.

This is also mentioned in the description of RestartLimit attribute pasted by Marianne above.

SOLUTION
Marianne's picture

Thank you very much for all of your input. Sorry for asking stupid basic queston.

We don't mind basic questions - all of us were new at one stage and back then there was no Symantec Connect to ask. So, we had to read manuals.

We do hope that you will read manuals when we point out the name of a manual and the relevant section.

You will see that I quoted from the manual 2 days ago:

If the RestartLimit attribute is set to a non-zero value, the agent attempts to
restart the resource before declaring the resource as faulted. When restarting a
failed resource, the agent framework calls the Clean function before calling the
Online function.
 
This means that when a resource 'goes offline unexpectedly' (normally because someone has killed/offline the process manually outside of cluster), the agent will run the Clean function (to be 100% sure processes are down) and the run the Online function.
 
Best to educate dba's, users, etc... to use ha commands to offline resources...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
mokkan's picture

Thank you very much all of you.