Video Screencast Help

why VCS retry online 2 minutes later

Created: 02 Apr 2013 • Updated: 08 Apr 2013 | 7 comments
This issue has been solved. See solution.

 

Hello all
 
I use SF 6.0.1 on Suse 11. I have a IP resource managed by VCS, in some situation, the IP resource can't be brought online at first time, so I set OnlineRetryLimit of IP to 3, hope VCS may be brought online via retry.
 
I executed my test, IP may be brought online after retry, but the interval between the first online and the second online(retry) is about 2 minutes, here is engine_A.log:
 
2013/04/03 01:42:31 VCS WARNING V-16-10031-4604 IP:MyappVip:online:Address 2001:1b70:0200:1026:0000:0000:0135:0084 
already exists: Res MyappVip will not go online.
2013/04/03 01:44:32 VCS ERROR V-16-2-13066 Agent is calling clean for resource(MyappVip) because the resource is not up even after online completed.
2013/04/03 01:44:33 VCS INFO V-16-2-13716 Resource(MyappVip): Output of the completed operation (clean) 
==============================================
RTNETLINK answers: Cannot assign requested address
==============================================
2013/04/03 01:44:33 VCS INFO V-16-2-13068 Resource(MyappVip) - clean completed successfully.
2013/04/03 01:44:33 VCS INFO V-16-2-13072 Resource(MyappVip): Agent is retrying online (attempt number 1 of 3).
 
 
I hope that VCS can retry online after 4 seconds, but I can not find a suitable parameter to decrease the interval, so my question is:
 
How to retry online after 4 seconds?
 
Operating Systems:

Comments 7 CommentsJump to latest comment

Daniel Matheus's picture

Hello kongzzz,

 

I guess your Onlinetimeout has been set to 120 seconds.

VCS will wait until this time has passed and then call the clean function before it retries the online procedure. When the OnlineRetryLimit is set to a non-zero value, the agent framework calls the Clean function before rerunning the Online function.

 

So your total wait time until the next online attempt is OnlineTimeout + time taken by clean function.

 

Please also note that setting the OnlineTimeout too low might lead to false alarms and/or resource not been able to online at all.

For the IP resource for example, THe IP agent does a IP online and 2 ARP requests which both take 2-5 seconds each.

Bringing an IP online using VCS will take roughly 10 seconds, on a heavy loaded system or when you bring several IPs online at the same time you need even more time (as the ARP requests are not send in parallel atm, an enhancement for this will be included in a later version of the IP agent).

I'd suggest you perform some more tests to find out the maximum time needed  to bring the resources online (especially if you start all service groups at once on a system), based on the time needed you can adjust the OnlineTimeout using below command:

 

#hatype -display IP | grep OnlineTimeout

#haconf -makerw

#hatype -modify IP OnlineTimeout <new timeout>

#hatype -display IP | grep OnlineTimeout

#haconf -dump -makero

 

Thanks,
Dan

 

 

 

If this post has helped you, please vote or mark as solution

starflyfly's picture

Hello kongzzz,

 

You can try to change  OnlineWaitLimit  to 1 to see if that can improve.

 

#haconf -makerw

#hatype -modify IP OnlineWaitLimit 1

#hatype -display IP | grep OnlineWaitLimit

#haconf -dump -makero

 

 

Regards

 

If the answer has helped you, please mark as Solution.

Paresh Bafna's picture

Hi kongzzzz,

The delay of 2 mins you are seeing is because of OnlineWaitLimit default value of 2. Agent waits for 2 monitor cycles after online is completed. If resource is not online even affter 2 monitor cycles then clean is called and online is retried based on OnlineRetryLimit value.

You can modify OnlineWaitLimit for IP agent from default value of 2 to 0. This will have following agent behavior -

If monitor entry point after online entry point reports resrouce as offline, clean will be scheduled immediately (without any delay in between). Once clean completes succussfully online will be retried immediately. The only delay seen in between is time required by the entry point to execute.

Hope this helps your usecase.

Thanks and Regards,

Paresh Bafna

Thanks and Regards,
Paresh Bafna

SOLUTION
kongzzzz's picture

Hello Daniel and starflyfly

 

Thanks for your help.

The OnlineTimeout is 300 previously. As your suggestion, I changed it to 30 and test my case again, but the CLEAN was still 2 minutes later.

I have not tried the OnlineWaitLimit, but I checked the current value, that is 2, already is a little value, so I guess it can not improve. Anyway I will try it later.

Additional info: I also tried OnlineRetryInterval (set it to 4), but it also can not improve.

Is there any other parameter can decrease the 120 seconds?

kongzzzz's picture

Hello Paresh

Very thanks for your professional explaination, I will try the OnlineWaitLimit later.

 

Thanks and Regards

kongzzzz

mikebounds's picture

The default MonitorInterval is 60 seconds so the delay in retrying is OnlineTimeOut Multipled by 2 (the OnlineWaitLimit), so changing OnlineWaitLimit to 1 should reduce the delay from 120 to 60 and you could reduce it further by saying changing MonitorInterval to 30, but I wouldn't set it any lower as otherwise the Monitor entry point will run too frequently in normal operation.  Also you should change MonitorTimeout to the same value (or less) than MonitorInterval so you don't get overlapping monitor entry points.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

kongzzzz's picture

Thanks all of your professional support. VCS trigger next online attempt of IP immediately after changing OnlineWaitLimit to 0. That works fine.