Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Heartbeat timeout value

Created: 20 Feb 2012 | 4 comments

Hi,

Recently one of the cluster node got rebooted due to all heartbeat network down (Due to some changes on switch. It took about app 60 Secs)

We informed about the reboot to Network Team and in turn they suggested to change the heartbeat timeout value to 60 Secs.

Requesting your help - Is it advisable to change the heartbeat timeout value to 60 Secs.

I think the default value is 15 Secs. If we change the value from default, what is the consequences?

Please advise.

Divakar

Comments 4 CommentsJump to latest comment

Wally_Heim's picture

Hi Divakar,

It is possible to set the heartbeat timeout to be 60 seconds.  However, on the windows platform we don't recommend setting the heartbeat value above 30 seconds. 

If you are concerned with the reboot there are several switches in the heartbeat configuration that control the reboot of the node in certain situations.  It sounds like you have one or more of these swtiches set.  You would check to see if disabling the reboot option would be more of what you are looking for.

Thanks,

Wally

Gaurav Sangamnerkar's picture

I wouldn't really recommend that value .. couple of reasons ..

1. manually increasing the timeout value means you are increasing the time cluster will detect the fault which means delayed fault detection, delayed corrective actions .. business may not really permit it, if the running apps are mission critical even 30s may have value.

2. 30s we are talking on heartbeat, so in case of split brain situation you are intentionally delaying cluster to take action which could be serious (hope you are IOFencing in place)

LLT or heartbeat is a very crucial part of cluster, in a runing cluster heartbeats are exchanged every 1 second to know the status of other nodes, total 15s of LLT time out + 15s of Gab timeout gives 30s of failover detection which I believe is very prominent from stability & resilience.

To my opininion it would not be wise idea..

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

Divakar SK's picture

Hi Gaurav,

Thanks for your input.

Due to spanning tree problem, network engineer asked to change the value to 60s Sec to avoid cluster failover.

Customer also not intrested this on this failover :-(

they are saying the spanning tree issue may take 45Sec to 60Sec to solve.

Could you please confim - what is the default heartbeat timeout value 15 Sec or 30 Sec

Thanks,

Divakar

Anoop_Kumar's picture

Agree with above comments that increasing timeout value is not advisable.

- To avoid failover, freezing SG is option.

- However, in case of LLT completely down, node will go down.

If you are using N/W switches between LLT links, there should be two switches for two High Priority LLT links. And, doing a change on one Switch at one time is advisable.

Having a single switch for all LLT links is again a risk on single poing of failure on LLT links.

~Anoop