Video Screencast Help

cluster behavior needed, which cfg vars to modify

Created: 01 Mar 2013 • Updated: 07 Mar 2013 | 1 comment
This issue has been solved. See solution.

Hallo,

 

I wish to have the following behavior from a Veritas cluster, monitoring a resource (app):

resource failed, first attempt to restart it on the same node, if not, migrate it to the second node.

However, is there another monitor which forces the resource to directly migrate if it fails too many times in a given timeframe, instead on starting it again on the same node ?

When testing, I have different behaviors depending on how much time I wait between manually killing the app and I do not know exactly which configurations I have to edit. basically, the question is how much time do I have between manually failing the resource, so the cluster restarts it again on the _same_ node?

 

cfg so far -> ToleranceLimit = 0 RestartLimit = 1 OnlineTimeout = 300.

 

Operating Systems:

Comments 1 CommentJump to latest comment

mikebounds's picture

The attribute you are missing is

ConfInterval

 

When a resource has remained online for the specified time (in
seconds), previous faults and restart attempts are ignored by
the agent. (See ToleranceLimit and RestartLimit attributes for
details.)
■ Type and dimension: integer-scalar
■ Default: 600 seconds

So with default ConInterval of 600 sec (10 mins) with:

RestartLimit=1, a resource will be restarted once and if it fails again within 10 mins it will cause failover but if it fails after 10 mins then it will be restarted again

ToleranceLimit=1, a failure will be ignored the first time and if it fails again within 10 mins it will cause failover but if it fails after 10 mins then it will be ignored again.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

SOLUTION