VCS node will not join to the cluster if LLT Private links MTU size is different to the LLT MTU of existing node clusters

Article:TECH159051  |  Created: 2011-04-28  |  Updated: 2011-04-29  |  Article URL http://www.symantec.com/docs/TECH159051
Article Type
Technical Solution


Issue



VCS node will not join to the cluster if somehow LLT Private links MTU size has been modified and it's different to the LLT MTU size of existing node clusters.

Despite of lltstat -vvn will show healthy LLT links with no obviuos LLT problems, and gabconfig -a will show rigth Port a/Port h membership, the VCS HAD daemon will fail to complete its start up operation in the joining node.


Error



In the engine_A.log of joining node we will see the following error message:

VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_members
hip: 0x3 current_jeopardy_membership: 0x0

In the engine_A.log of the existing cluster nodes we will see the following error message:

VCS INFO V-16-1-10455 Sending snapshot to node membership: 0x2


Environment



VCS Cluster for Unix


Cause



The LLT MTU size value in the joining node is different to the existing cluster active nodes, so despite the LLT links are reported as up there're still a communication problem between the nodes.

In the example below we have the default 1500 value:

bash-3.00# lltstat -c
LLT configuration information:
    node: 1
    name: sun07
    cluster: 33
      Supported Protocol Version(s)     : 5.0
    nodes: 0 - 63
    max nodes: 64
    max ports: 32
    links: 2
    mtu: 1452 <<<<<<<<<<<<
    max sdu: 66560
.....

   
and here we've a different value:

bash-3.00# lltstat -c
LLT configuration information:
    node: 0
    name: sun06
    cluster: 33
      Supported Protocol Version(s)     : 5.0
    nodes: 0 - 63
    max nodes: 64
    max ports: 32
    links: 2
    mtu: 9146 <<<<<<<<<<<<<<<
    max sdu: 66560
...

When LLT starts, it gets the NIC MTU size information from the OS kernel. If the NIC MTU size value has been modified to, for instance, start using jumbo frames, the LLT private links will start with this new MTU size setting.

A symptom of the problem is the OS message file reporting LLT in trouble/active messages constantly like:

LLT INFO V-14-1-10205 link 0 (ce0) node 0 in trouble
LLT INFO V-14-1-10024 link 0 (ce0) node 0 active

Another symptom of the problem is the high rate of retransmited data packets in the existing cluster active node sending the snapshot to the joining node:

bash-3.00# lltstat
LLT statistics:
    185        Snd data packets
    388981     Snd retransmit data
.........
 


Solution



We have to be sure LLT links are the same MTU size value in all the cluster nodes.

As the LLT NIC properties could be accidentally modified, it's a good practice to set the LLT MTU size values in the /etc/llttab LLT configuration file. Here we set it to the usual default value of 1500:

bash-3.00# cat /etc/llttab
set-node sun06
set-cluster 33
link ce0 /dev/ce:0 - ether - 1500
link ce1 /dev/ce:1 - ether - 1500




Article URL http://www.symantec.com/docs/TECH159051


Terms of use for this information are found in Legal Notices