Node is not able to join cluster, HAD daemon getting killed

Article:TECH72052  |  Created: 2009-01-02  |  Updated: 2012-08-21  |  Article URL http://www.symantec.com/docs/TECH72052
Article Type
Technical Solution


Environment

Issue



Node is not able to join cluster, HAD daemon getting killed


Solution



It is observed in Linux environment that sometimes node face issues in joining the cluster. For e.g in a multinode cluster, couple of nodes will form cluster but when other nodes try to join the cluster "had" daemon gets killed on nodes that were already part of cluster.
 
Also seen that later nodes when try to join the cluster get stuck on "REMOTE_BUILD" state, they get stuck because the node which was providing the snapshot of main.cf to them leaves cluster membership. You would notice following error message in this case in engine_A.log:
 
V-16-1-10468 Node providing snapshot has left the cluster
 
Also, you will observe following GAB message in the engine_A.log:
 
Jun 24 16:40:13 <hostname> Had[16151]: VCS ERROR V-16-1-10119 GabHandle::push returned = 12, gh_src = 0, gh_gen = 0, gh_size = 16358
 
Jun 24 16:40:13 <hostname> Had[16151]: VCS ERROR V-16-1-11103 VCS exited. It will restart
 
Even if you try stopping &restarting "had", GAB &LLT - it doesn't help.
 
Major cause found for this behavior is lack of kernel memory
You can workaround this by rebooting the server, that should free up kernel space for that time but likely issue might appear again.
 
 

 

 


Legacy ID



327368


Article URL http://www.symantec.com/docs/TECH72052


Terms of use for this information are found in Legal Notices