Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

LLT connections mismatch

Created: 20 Jan 2013 • Updated: 07 Mar 2013 | 8 comments
This issue has been solved. See solution.

My cluster was configured some time ago and had two dedicated and one low priority LLT links.
About a week ago I found that one of the dedicated links is down. It turned out that ports on the switch were disabled.
So, I asked to enable them and OS on both nodes started to see interfaces running. But the cluster still shows that one link is down.
Please find below the lltstat output from both nodes and llttab files from both nodes:

As you can see each node sees its own links but only two from the other node. I thought that restart of the cluster and perhaps the node will help, but not sure. I also did not want to restart aplication if it is not necessary.

My question is will it help to stop cluster with hastop -all -force command and then restart or reconfigure LLT links or I really have to stop everything, fix LLT links and then start the app. Only one node of the cluster is active the other does not have any application services running.

node-ora01:root ~ # lltstat -n
LLT node information:
    Node                 State    Links
   * 0 node-ora01   OPEN        3
     1 node-ora02   OPEN        2

node-ora01:root ~ # cat /etc/llttab
set-node node-ora01
set-cluster 3415
link eth1 eth-9c:8e:99:fa:21:0a - ether - -
link eth3 eth-9c:8e:99:fa:21:0e - ether - -
link-lowpri bond0 bond0 - ether - -
 

node-ora02:root ~ # lltstat -n
LLT node information:
    Node                 State    Links
     0 node-ora01   OPEN        2
   * 1 node-ora02   OPEN        3

node-ora02:root ~ # cat /etc/llttab
set-node node-ora02
set-cluster 3415
link eth1 eth-9c:8e:99:f9:ec:bc - ether - -
link eth3 eth-9c:8e:99:f9:ec:c0 - ether - -
link-lowpri bond0 bond0 - ether - -

 

Comments 8 CommentsJump to latest comment

mikebounds's picture

Can you post output of "lltstat -nvv from each node.  It sounds as though the connection is broke between a pair of the interfaces - you can test this by plumbing in IPs to on broken link to see if you ping.

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

vostrushka's picture

Please find below lltstat -nvv output:

 

Node 1
---------
LLT node information:
    Node                 State    Link  Status  Address
   * 0 node-ora01   OPEN    
                                  eth1   UP      9C:8E:99:FA:21:0A
                                  eth3   UP      9C:8E:99:FA:21:0E
                                  bond0   UP      9C:8E:99:FA:21:08
     1 node-ora02   OPEN    
                                  eth1   UP      9C:8E:99:F9:EC:BC
                                  eth3   DOWN    
                                  bond0   UP      9C:8E:99:F9:EC:BA
     2                   CONNWAIT
                                  eth1   DOWN    
                                  eth3   DOWN    
                                  bond0   DOWN    
 
Node 2
---------
LLT node information:
    Node                 State    Link  Status  Address
     0 node-ora01   OPEN    
                                  eth1   UP      9C:8E:99:FA:21:0A
                                  eth3   DOWN    
                                  bond0   UP      9C:8E:99:FA:21:08
   * 1 node-ora02   OPEN    
                                  eth1   UP      9C:8E:99:F9:EC:BC
                                  eth3   UP      9C:8E:99:F9:EC:C0
                                  bond0   UP      9C:8E:99:F9:EC:BA
     2                   CONNWAIT
                                  eth1   DOWN    
                                  eth3   DOWN    
                                  bond0   DOWN    
 
mikebounds's picture

This output means the connection between eth3 is down.  The output should be interpreted as follows:

 

     0 node-ora01   OPEN    
                                  eth1   UP      9C:8E:99:FA:21:0A     Can see interface on other node
                                  eth3   DOWN                                   Can NOT see interface on other node
                                  bond0   UP      9C:8E:99:FA:21:08  Can see interface on other node
   * 1 node-ora02   OPEN    
                                  eth1   UP      9C:8E:99:F9:EC:BC      Local interface is UP
                                  eth3   UP      9C:8E:99:F9:EC:C0      Local interface is UP
                                  bond0   UP      9C:8E:99:F9:EC:BA   Local interface is UP
 
So local interfaces are ok, but connection is broken for eth3
 
Mike

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

vostrushka's picture

Yes, I understand that.

The way I tried to restart it today morning did not help. Perhaps, I did not follow the right sequence.

I'll try it on my test cluster first and then try again.

Leonid

mikebounds's picture

Perhaps I wasn't clear enough in my last post - this is not an issue with the cluster or the node - it is an issue with the cables or the switch.  You do not need to restart anything on the host for it to see a network connection that was previously broken.  The only possible problem with the host is if the eth3 interfaces on each machine are running at different speeds, but if it was working previously and you have not changed anything, then this is very unlikely.  As I said earlier, to verify link is down, what I would do is to plumb IPs on the interfaces - example:

First test eth1 works with IPs (i.e you are testing there are no firewalls that allows LLT and not ping)

plumb 1.1.1.1, mask 255.255.255.0 on eth1, node-ora01 

plumb 1.1.1.2, mask 255.255.255.0 on eth1,node-ora02

Then test you can ping 1.1.1.2 from node-ora01 and if it doesn't work, you could try ssh, traceroute or "telnet 1.1.1.2 port" to see if other ports work.  You can test connection the other way too.

Once you have verified this works - then test eth3:

 

plumb 1.1.3.1, mask 255.255.255.0 on eth3, node-ora01 

plumb 1.1.3.2, mask 255.255.255.0 on eth3, node-ora02

Repeat connection tests for 1.1.3.2 from node-ora01

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has answered your question then please click on "Mark as solution" link below

vostrushka's picture

I see. No, I am not giving up. ;-)

I just take a step back to see what I can do. I will do some testings with IP addresses  and check speed.

Then try to play it on my test cluster. I'll report in couple of days what become a solution.

Leonid

avsrini's picture

Hi Leonid,

If you are only running VCS with gab ports a and h, then yes you can force stop VCS with applications

running and then restart GAB / LLT for eth3 to come up. But as others mentioned, make sure eth3

is connected between nodes via pinging an temp IP.

 

Regards

Srini

 

vostrushka's picture

I did get down time for the cluster but it turned out that something wrong with ports on the switch or cables.

So far only node itself sees NIC is up and running, other node does not.

I also found how to enable or disable LLT link without restarting the whole stack or the cluster:

lltconfig -u eth3

lltconfig -t eth3 -d eth3

Thank you all.

Leonid

 

SOLUTION