Storage Foundation Cluster File System for Oracle RAC set-dbg-minlinks option in /etc/llttab

Article:TECH62994  |  Created: 2008-01-02  |  Updated: 2010-12-15  |  Article URL http://www.symantec.com/docs/TECH62994
Article Type
Technical Solution


Environment

Issue



Storage Foundation Cluster File System for Oracle RAC set-dbg-minlinks option in /etc/llttab


Solution




set-dbg-minlinks option in /etc/llttab

Low Latency Transport Protocol (LLT) is used for all cluster communications as a high-performance, low-latency protocol. For configuring LLT, it is recommended to use two or more private networks to have a reliable cluster communication. LLT performs heartbeat and cluster communication over these private networks. In all cases, when LLT on a system no longer receives heartbeat messages from another system on any of the configured LLT interfaces, Group Membership Services/Atomic Broadcast (GAB) reports a change in membership. When a system has only one interconnect link remaining to the cluster, GAB can no longer reliably discriminate between loss of a system and loss of the network. As a result the reliability of the system's membership is considered at risk. A special membership category called a jeopardy membership takes effect in this situation. This provides the best possible split-brain protection without membership arbitration and SCSI-3 capable devices.

From Veritas Cluster Server (VCS) point of view, when a system is placed in jeopardy membership status, the Service Groups states are not impacted and will remain online or offline as they already are seen in the cluster, but if in case once the cluster is in jeopardy membership, and IF the node loses the connectivity to the last interconnect link OR if we lose any one node which is in an existing jeopardy memberhsip, then all the service groups that are online on the node/s that were in jeopardy membership, will be marked in a special "autodisabled" state in the cluster.

From  Veritas Cluster File System (CFS) point of view, if the jeopardy membership followed by node loss occurs and if Symantec Fencing is not configured in enabled mode:
->    CFS on other nodes does not have any any indication whether the peer nodes on the other side of the network partition have crashed or
->   Whether there is a split-brain situation.

Due to this uncertainty, CFS performs the following tasks:
(a) Disables the shared mount points on all the nodes in "SFCFS for Oracle RAC 5.0MP2 on Linux".
(b) Does not disable the shared mount points on all the nodes in "SFCFS for Oracle RAC 5.0MP3 on Linux".

On the other hand, if all the private links go down simultaneously (network partition or split-brain), the behavior of VCS and CFS changes. In such a situation:
(a) VCS fails over the service group from the faulted node to one of the other running nodes.
(b) CFS performs normal operations (performs the recovery).

Note: For both the above cases, it is assumed that Symantec I/O Fencing module is present and has already brought down the nodes to ensure that only one side of the network partition is survived before VCS and CFS can start their recovery or other operations. In split-brain situations, CFS should not start its recovery before the other side of the network is down. Otherwise it may lead to CFS corruption. Since SFCFS for Oracle RAC does not support Symantec's implementation of SCSI-3 PGR based I/O fencing and Oracle Clusterware (CRS) is expected to handle any split-brain situations, it is mandatory that VCS failover and CFS recovery operations start only after CRS has kicked out the nodes to make sure that only one side of the network partition is up. Once CRS has completed its decision and only one side of the network partition is up, VCS and CFS recovery operations can proceed. Refer to the following technote for more details about this:
http://www.symantec.com/docs/TECH61848


GAB reports jeopardy membership if there is only one cluster interconnect remaining for LLT to communicate with the cluster nodes. This behavior of GAB introduces some limitation if the link aggregation has been used for LLT links. In case of link aggregation, two or more interconnects are aggregated together to form a single interface which is exposed to the system. In case of  bonded interface (with two or more slave interfaces) specified in /etc/llttab for LLT configuration, LLT assumes that there is only one interconnect (while actually there are more than one interconnects underneath) available for cluster heartbeat and communication. Therefore GAB forms jeopardy membership even when there are actually more than one interconnects available for LLT.

The following are the ways by which GAB's jeopardy membership reporting can be turned off while using one aggregated (bonded) interface for LLT:

(a) With set-dbg-minlinks parameter:
------------------------------------
A hidden parameter in LLT (set-dbg-minlinks) can be specified in /etc/llttab for configuring LLT. While LLT is running, this parameter can not be set dynamically.
The following example shows how to use set-dbg-minlinks with aggregated links for LLT configuration:

# cat /etc/llttab

set-node node1
set-cluster 1234
link bond1 bond1 - ether - -
set-dbg-minlinks 2

In the above example, bond1 is an aggregated (bonded) link with two or more slave links underneath it. When set-dbg-minlinks value is set to 2, GAB's  
reporting for jeopardy condition is turned off and GAB does not form a jeopardy membership. If one of the slave links fails, LLT does not report any loss of link to GAB, since the heartbeat is continued on the other slave link(s) of this bonded link. Only when all the slave links of the bonded link go down (either because of switch failure or some other reasons), LLT reports it to GAB. GAB then forms a new membership which is then sent to all the GAB clients like VCS, CFS etc. On receiving new membership (with node loss), VCS and CFS start their failover and recovery operations.

(b) Without set-dbg-minlinks parameter:
---------------------------------------
A low priority link can be used in /etc/llttab without using set-dbg-minlinks parameter. While using low priority link with other private links up, LLT sends only heartbeat messages to the peer nodes over low priority link. All the cluster communication takes place over private links. Only when all the private links are down, LLT sends cluster communication messages over low priority link along with the heartbeat messages. This sends all cluster communication traffic on the low-pri network (which may be the same as the public network) increasing the traffic on this link.

The following example shows how to use low priority link for LLT configuration:

# cat /etc/llttab

set-node node1
set-cluster 1234
link bond1 bond1 - ether - -
link-lowpri eth4 eth4 - ether - -

The above example uses one low priority link. In this case GAB does not report jeopardy membership, since there are two links now (one bonded link and another low priority link). There is no need to use set-dbg-minlinks parameter here.

Note: It is not recommended to use low priority link for LLT configuration if Symantec's implementation of SCSI3 PGR based I/O fencing is supported (like in SF Oracle RAC). If SCSI3 PGR based I/O fencing is available and there is only one bonded link for LLT configuration, then the "set-dbg-minlinks 2" parameter can be used to turn off GAB's jeopardy membership reporting [as described in section (a)]. During network partition, Symantec's I/O fencing module takes care of resolving the split-brain among various sub-clusters. Ultimately only one side of the network partition survives.


Summary

In short, the following configurations can be referred according to the setup:

1. With Symantec's I/O fencing disabled and only one bonded interface for LLT:
-> Append "set-dbg-minlinks 2" line in /etc/llttab and restart LLT
-> GAB does not report jeopardy membership to its clients
-> AutoDisable behavior of VCS does not come into picture
-> CFS does not disable shared mount points on LxRT SFCFS for Oracle RAC 5.0MP2. It is not applicable for LxRT SFCFS for Oracle RAC 5.0MP3, since jeopardy handling feature itself is not present in this release.
-> CRS to take care of split-brain cases
-> Need to make sure that CFS starts its recovery only after CRS has brought down the nodes and only one side of network partition is survived. Refer to the following technote for more details about this:
http://entsupport.symantec.com/docs/306411

2. With Symantec's I/O fencing disabled and only one bonded interface for LLT:
-> A low priority link can be specified in /etc/llttab file for LLT configuration. With one low priority and one bonded link, GAB does not report jeopardy membership to its clients.
-> Use of low priority link is not recommended because when all the private links are down, LLT sends heartbeat messages and all cluster communication traffic over this low-pri link (which may be the same as the public network) increasing the traffic on this link.

3. With Symantec's I/O fencing enabled and only one bonded interface for LLT:
-> Append "set-dbg-minlinks 2" line in /etc/llttab and restart LLT
-> GAB does not report jeopardy membership to its clients
-> Symantec's I/O fencing takes care of split-brain cases before CRS can perform any operations




Legacy ID



308107


Article URL http://www.symantec.com/docs/TECH62994


Terms of use for this information are found in Legal Notices