Video Screencast Help

Where is 'MonitorTimeout ' in windows?

Created: 30 May 2011 • Updated: 01 Jun 2011 | 7 comments
This issue has been solved. See solution.

Hi,

 

On one of the thread it has been mentioned that 'MonitorTimeout' value need to be increased in order to solve a specific issue (online of a service delay, more than 60 sec). I am wondering if any of you aware where is the 'monitorTimeout' value in VCS windows ?

 

Regards

Ramesh

Comments 7 CommentsJump to latest comment

mikebounds's picture

You can view and set MonitorTimeout and OnlineTimeout using "hatype -display/modify" from Command Prompt, however, I believe the reason you don't see them in Windows Java Gui, is these entry points are not supported in Windows - see extract from the agent dev guide which says:

 
The following entry point timeouts do not apply for agents on Windows:
■ ActionTimeout
■ AttrChangedTimeout
■ CloseTimeout
■ CleanTimeout
■ FaultOnMonitorTimeout
■ InfoTimeout
■ MonitorTimeout,
■ OfflineTimeout
■ OnlineTimeout
■ OpenTimeout.
On the Windows platform, agents do not execute entry point timeouts, but wait
until the entry points complete. For information concerning the timeouts listed
above, see the Veritas Cluster Server Administrator’s Guide.
 
 
However, when you look at the VCS Admin guide, the only references to these named attributes above (like OfflineTimeout) are in the sections below.  Now these sections do mention "functions" as oppose to "entry points", but this change in terminology only happen in 5.1SP1, in the 5.1 VCS admin guide the terminology "Entry Point was used", so it looks at though in SP1, "Entry Point" was replaced with "function", so for example the 5.1 VCS admin guide says:
The attributes MonitorTimeout, OnlineTimeOut, and
OfflineTimeout indicate the maximum time (in seconds) within which the
monitor, online, and offline entry points must complete or else be terminated
 
and the  5.1SP1 VCS admin guide says:
The attributes MonitorTimeout, OnlineTimeOut, and
OfflineTimeout indicate the maximum time (in seconds) within which the
monitor, online, and offline functions must complete or else be terminated
 

Other references in 5.1 VCS admin guide are:

 

Ref 2: On page 486:

About the FaultOnMonitorTimeouts attribute

The FaultOnMonitorTimeouts attribute defines whether VCS interprets a
Monitor function timeout as a resource fault.
If the attribute is set to 0, VCS does not treat Monitor timeouts as a resource
faults. If the attribute is set to 1, VCS interprets the timeout as a resource fault
and the agent calls the Clean function to shut the resource down.
By default, the FaultOnMonitorTimeouts attribute is set to 4. This means that
the Monitor function must time out four times in a row before the resource is
marked faulted.
 
Ref3: And then the next page follows with how this timeout is used, but nowhere does it says it is not applicable in Windows:
 
Ref4: On page 662 there is a whole paragraph about setting timeouts
You can also adjust how often VCS monitors various functions by modifying
their associated attributes. The attributes MonitorTimeout, OnlineTimeOut, and
OfflineTimeout indicate the maximum time (in seconds) within which the
monitor, online, and offline functions must complete or else be terminated. The
default for the MonitorTimeout attribute is 60 seconds. The defaults for the
OnlineTimeout and OfflineTimeout attributes is 300 seconds. For best results,
Symantec recommends measuring the time it takes to bring a resource online,
take it offline, and monitor before modifying the defaults. Issue an online or
offline command to measure the time it takes for each action. To measure how
long it takes to monitor a resource, fault the resource and issue a probe, or bring
the resource online outside of VCS control and issue a probe.
agent dev guide
 
Ref5: Page 664 says:
It may take multiple
monitor intervals before a database server is reported online. When this occurs,
it is important to have the correct values configured for the OnlineTimeout
 
Ref6: (actually mulitple references) Page 752 to 762 has a table that describes several timeouts that are application in Windows.

 

 

I raised this discrepency with Symantec Support in March this year and this was their response:

 

 

Entrypoint timeouts are not the same as resource timeouts. All the timeouts you mention do take effect on Windows. I find it a bit hard to understand why anyone would think they didn’t.

Entrypoint timeout is a timeout internal to the instruction sent and only of interest to those developing agents for use on Windows (which is why it’s only in the dev guide). It’s just a case of what processes the timeout. The entrypoint itself will not return a code stating e.g. “I have timed out” to the agent, but the agent has a timeout internally after which point it can declare the online/offline/monitor etc. to have timed out. The effect is the same, the mechanism is different.

If you need to set a timeout, please just set it.

 

I didn't understand this response from Support and so responded with the following below, but never got any more replies from Support:

 

So are you saying, for example, there is an OfflineTimeout for an Offline EntryPoint which does NOT apply to Windows and an exactly named OfflineTimeout for an offline function which does apply in Windows and if you are, if this something that changed in 5.1SP1 or was this true in 5.1 also
 
Let's clarify this further.  The OfflineTimeout is a resource type attribute, not a resource attribute (I know you can override type attribute and set at resource level, but lets not complicate things even further).  The OfflineTimeout attribute is set to 300 by default, but this attribute cannot been seen in the Java GUI and can only be seen or set from "hatype command" (or editing types files).  This attribute can be set for ALL resource types.
 
Now a resource attribute is something completely different, but this is not a timeout available to every agent and so there are only a few resources with resource timeouts such as ServiceMonitor agent which has a MonitorProgTimeout attribute, which obviously should work.
 
My experience is that setting timeouts like OnlineTimeOut and MonitorTimeOut using hatype has no effect and so the Agent Dev guide is correct and the VCS Admin guide is wrong.
 
Mike

 

 

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

rameshsa's picture

Mike, thanks a lot for detailed explanation. Here is the problem, hope you could advise an option to resolve the issue..

I have a generic service resource that is taking time to come online.  I want to increase the agent that is associated with this resource type so that the service resource has sufficient time to start.  Let’s say the service takes 2 minutes (120 seconds) to come up so I need to change the monitor cycle to 120 seconds

 

Regards

Ramesh

mikebounds's picture

If the service is taking a long time to come online, then it is the OnlineTimeout you need to set, not the MonitorInterval or MonitorTimeout.  You can try changing this, but as I said in the last post, I don't know if the attribute is used.  To set use:

 haconf -makerw
hatype -modify GenericService OnlineTimeout 120
haconf -dump -makero 

However, your problem might be that you need to increase the DelayAfterOnline resource attribute (this is by default 10 seconds).  You can set this to a higher value in the GUI, or from the command line use:

 

 

 haconf -makerw

hares -modify resource_name DelayAfterOnline 20

haconf -dump -makero 
 

 

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

rameshsa's picture

Onlinetimeout value is 300 which is set on service group and I guess it will override whatever settings on individual resource, if yes then onlinetimeout value may not be the solution.

We monitored service startup delay which is not more than 2 mins and its with in the limit of Onlinetimeout. Thanks for suggestion,

'DelayAfterOnline' has been set to 10sec. Do you think 'delayafterOnline' parameter has any affect since resource has been called for clean after 21sec.

2011/05/28 11:10:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource xxxx-xxxx-xxxx (Owner: unknown, Group: xxx) on System xxxxxx

2011/05/28 11:12:51 VCS ERROR V-16-2-13066 (xxxxx) Agent is calling clean for resource(xxxx-xxxx-xxxx) because the resource is not up even after online completed.

2011/05/28 11:12:51 VCS ERROR V-16-2-13069 (xxxxx) Resource(xxxx-xxxx-xxxx) - clean failed.

Marianne's picture

Have another look at this extract that Mike posted above:

Ref4: On page 662 there is a whole paragraph about setting timeouts:
 
You can also adjust how often VCS monitors various functions by modifying
their associated attributes. The attributes MonitorTimeout, OnlineTimeOut, and
OfflineTimeout indicate the maximum time (in seconds) within which the
monitor, online, and offline functions must complete or else be terminated. The
default for the MonitorTimeout attribute is 60 seconds. The defaults for the
OnlineTimeout and OfflineTimeout attributes is 300 seconds.
 
For best results, Symantec recommends measuring the time it takes to bring a resource online,
take it offline, and monitor before modifying the defaults. Issue an online or
offline command to measure the time it takes for each action. To measure how
long it takes to monitor a resource, fault the resource and issue a probe, or bring
the resource online outside of VCS control and issue a probe.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mikebounds's picture

There could be a problem with monitor routine, so I what I would try, is to start service under Windows and then probe resource (or wait 5 minutes) and see if VCS marks the resource as online.

Looking at the timing, VCS MAY be doing something like:

2011/05/28 11:10:37 VCS NOTICE V-16-1-10301 Initiating Online of Resource xxxx-xxxx-xxxx (Owner: unknown, Group: xxx) on System xxxxxx

11:12:37:  2 mins later agent times out, but waits 10 seconds before running monitor

11:12:47:  Monitor routine runs and takes 4 seconds to determine resource is not online

2011/05/28 11:12:51 VCS ERROR V-16-2-13066 (xxxxx) Agent is calling clean for resource(xxxx-xxxx-xxxx) because the resource is not up even after online completed.

 

To see if this is the case you could try increasing delayafterOnline and OnlineTimeout to see what the effect is.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

SOLUTION
rameshsa's picture

Thanks a lot, will try out those options..

 

Regards

Ramesh