MultiNIC faulted and then switchover the base+virtual IP to another interface
Created: 11 Sep 2012 | Updated: 18 Oct 2012 | 21 comments
This issue has been solved. See solution.
Expected Environment
Solaris 10
HA 6.0 (Single Node Cluster)
one built-in interface on solaris machine (bge0)
four port interface card installed (qfe0, qfe1, qfe2, qfe3)
I configured MultiNIC and add qfe0 and qfe1 with base ip 192.168.253.140 and 192.168.253.140 under MultiNIC resource and also configured IPMultiNIC resource with Virtual IP 192.168.253.250
I unplugged the networking cable from qfe0. The MultiNIC + IPMultiNIC resource faulted and after a while the MultiNIC + IPMultiNIC resource came online again and than base + virtual ip switched over.
My question why the resource is getting faulted and comming back with new IP+interface. why not IP's switched over without faulted. Any thing I am missing.
Discussion Filed Under:
Comments 21 Comments • Jump to latest comment
Make sure you have other IPs on the 192.168.253 subnet on other servers and routers as VCS will ping out to check network.
A few other points:
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
Yes I have one IP assigned on a server 192.168.253.1
1.) That was testing phase but I configured bge0 and qfe0
2.) right now I want to use MultiNIC
3.) Yes this attribute is set
But still same problem ... When I unplugged a active NIC(bge0) cable the resource(MultiNIC) got faulted and after a while MultiNIC resource got up and IP's assigned on backup NIC(qfe0)
Dont know why the MultiNIC resource getting faulted
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
I might did something wrong. I created a new service group and did add the MultiNIC and IP MultiNIC resource and linked them both. See the below main.cf file for reference. But now when I unplugged the LAN cable no Service Group is faulting even no IP(Base+Virtual) is switching to the backup NIC/interface. Solaris is saying the IP is live but that IP is not pingable from outside
""""base IP and virtual IP""""
( # ping 192.168.253.254
192.168.253.254 is alive
# ping 192.168.253.250
192.168.253.254 is alive
)
""""main.cf file""""
bash-3.00# cat /etc/VRTSvcs/conf/config/main.cf
include "OracleASMTypes.cf"
include "types.cf"
include "Db2udbTypes.cf"
include "OracleTypes.cf"
include "SybaseTypes.cf"
cluster CLUSTERA (
UserNames = { admin = bIJbIDiFJeJJhRJdIG }
Administrators = { admin }
)
system solaris (
)
group CLUSTERA-SG (
SystemList = { solaris = 0 }
)
IPMultiNIC IPMnic (
Address = "192.168.253.250"
NetMask = "255.255.0.0"
MultiNICResName = MultiNIC
)
MultiNICA MultiNIC (
Device @solaris = { bge0 = "192.168.253.254",
qfe0 = "192.168.253.254" }
NetMask = "255.255.0.0"
NetworkHosts = { "192.168.253.1" }
)
IPMnic requires MultiNIC
// resource dependency tree
//
// group CLUSTERA-SG
// {
// IPMultiNIC IPMnic
// {
// MultiNICA MultiNIC
// }
// }
""""ifconfig -a""""
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 52
inet 0.0.0.0 netmask ff000000
ether 8:0:20:ea:2:2c
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 47
inet 192.168.253.240 netmask ffff0000 broadcast 192.168.255.255
ether 8:0:20:ea:2:2d
bge0: flags=1000803<UP,BROADCAST,MULTICAST,IPv4> mtu 1500 index 51
inet 192.168.253.254 netmask ffff0000 broadcast 192.168.255.255
ether 0:3:ba:78:61:75
bge0:1: flags=1000803<UP,BROADCAST,MULTICAST,IPv4> mtu 1500 index 51
inet 192.168.253.250 netmask ffff0000 broadcast 192.168.255.255
The base and virtual ip got stuck with bge0 and did not failover
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
From the VCS 6.0 Bundled Agents reference guide
http://www.symantec.com/business/support/resources...
----------
MultiNICA notes
[...]
The MultiNICA agent supports only one active interface on one IP subnet; the agent does not work with multiple active interfaces on the same subnet.
On Solaris, for example, you have two active NICs, hme0 (10.128.2.5) and qfe0 (10.128.2.8). You configure a third NIC, qfe1, as the backup NIC to hme0. The agent does not fail over from hme0 to qfe1 because all ping tests are redirected through qfe0 on the same subnet. The redirect makes the MultiNICA monitor return an online status. Note that using ping -i does not enable the use of multiple active NICs.
[...]
----------
in your ifconfig output, you still have qfe1 configured on the same subnet - so the ping tests are going through this interface, so it doesn't think the multinic group (bge/qfe0) is offline
If this post has helped you, please vote or mark as solution
What may be the suggested subnet in this scererio ?
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
???????????
If you actually read the document, it's saying don't put other interfaces (not in the MultiNIC group) on the same subnet.
So, take qfe1 down, or give it another IP address on another subnet (ie: not 192.168.253.x), so MultiNIC can manage qfe0, bge0 properly
If this post has helped you, please vote or mark as solution
Resource faulted. After a while base ip + virtual ip failed over to qfe0
bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 54
inet 192.168.253.254 netmask ffff0000 broadcast 192.168.255.255
ether 8:0:20:ea:2:2c
qfe0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 54
inet 192.168.253.250 netmask ffff0000 broadcast 192.168.255.255
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 47
inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255
ether 8:0:20:ea:2:2d
==========
I have to plumb the bge0. Its not in the ifconfig -a result after failed over
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
As in earlier post "VCS will ping out to check network" and as Grace as says, VCS things MulitNIC is up, because it can ping 192.168.253.1 (via qfe1) Not sure what you mean by "What may be the suggested subnet in this scererio ?" Why do you need 2 networks on the same subnet.
Another thing is that you may want to configure "IfconfigTwice" as without this, many routers will not get the new mac address of the IP when IP fails over, which stops you being able to ping from outside.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
So is it all working now?
From what I can remember (it has been a long time since I have used MultiNICA as most customers use MultiNICB), it is normal for unused interface to be unplumbed.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
No mike. See my above last post please
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
I'm not clear what s wrong, base ip + virtual ip are failing over and unused base ip is being unplumbed - this is what is supposed to happen.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
This is really suppose to happen mike but when the resource is getting faulted
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Sorry Zahid, you are right, resource should not fault. It may have faulted as qfe0 was plumbed in as I think this is not supposed to be plumbed in. Now that bge0 is not plumbed in, can you repeat test, pulling bge0 and if resource faults, can you give extract from engine_A.log and MultiNICA_A.log from when the failure occurs.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
Debug logs to help debug issue
Hi Zahid,
You can also enable debug logs for MultiNICA and IPMultiNIC agent while performing the test. ThHis will help debug the issue and narrow down to root cause.
Debug logs can be enabled for MultiNICA and IPMultiNIC agents using following commands -
# hatype -modify MultiNICA LogDbg 1 2 3 4 5
# hatype -modify IPMultiNIC LogDbg 1 2 3 4 5
Thanks and Regards,
Paresh Bafna
Thanks and Regards,
Paresh Bafna
@ mike and Paresh
Thanks for kind replies and quick follow-up I appreciate. Let me do this and will share the result
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
@ Paresh and mike: As suggested
# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
qfe0: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
ether 8:0:20:ea:2:2c
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255
ether 8:0:20:ea:2:2d
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
inet 192.168.253.254 netmask ffff0000 broadcast 192.168.255.255
ether 0:3:ba:78:61:75
bge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 7
inet 192.168.253.140 netmask ffff0000 broadcast 192.168.255.255
ifconfig result after unplugged the LAN cable
bash-3.00# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
qfe0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 9
inet 192.168.253.254 netmask ffff0000 broadcast 192.168.255.255
ether 8:0:20:ea:2:2c
qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255
ether 8:0:20:ea:2:2d
engine and ip multinic result are attached and multinic logs dont have any thing
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Attachment is just an empty logs directory - attachment should be bigger than 109 bytes.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
Kindly check attachment again.
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Hi Zahid,
Thanks for the logs.
As per my understanding below is sequence of events that may have taken place during your testing –
#1
MultiNICA resource is online with bge0 as active interface.
# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
qfe0: flags=1000842 mtu 1500 index 3
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
ether 8:0:20:ea:2:2c
qfe1: flags=1000843 mtu 1500 index 4
inet 10.0.0.1 netmask ff000000 broadcast 10.255.255.255
ether 8:0:20:ea:2:2d
bge0: flags=1000843 mtu 1500 index 7
inet 192.168.253.254 netmask ffff0000 broadcast 192.168.255.255
ether 0:3:ba:78:61:75
bge0:1: flags=1000843 mtu 1500 index 7
inet 192.168.253.140 netmask ffff0000 broadcast 192.168.255.255
Point to be noted here is even though bge0 is active interface other interface - qfe0 is also plumbed with 0.0.0.0 address.
#2
Cable pull performed on bge0.
MultiNICA agent detects cable pull and marks bge0 as failed interface.
2012/09/14 14:17:27 VCS WARNING V-16-10001-6004 (solaris) MultiNICA:MultiNIC:monitor:Device bge0 FAILED
#3
Now MultiNICA agent tries to bring up qfe0 interface by try to plumb interface and assign base IP address ("192.168.253.254") to the interface. This is done in agent by executing command similar to below –
# ifconfig qfe0 inet plumb 192.168.253.254 netmask 255.255.0.0 up
#4
When MultiNICA agent tries to execute above mentioned command, execution fails as qfe0 is already plumbed. This can be seen from following log –
2012/09/14 14:17:28 VCS ERROR V-16-10001-6018 (solaris) MultiNICA:MultiNIC:monitor:Error in 'ifconfig' command execution:ifconfig: SIOCSLIFNAME for ip: qfe0: already exist
2012/09/14 14:17:28 VCS WARNING V-16-10001-6019 (solaris) MultiNICA:MultiNIC:monitor:Device qfe0 could not be brought up
This error will cause agent to clear (unplumb) qfe0 interface.
#5
As attempt to bring up qfe0 failed and bge0 is not connected, agent reports all interfaces as down.
2012/09/14 14:17:28 VCS ERROR V-16-10001-6014 (solaris) MultiNICA:MultiNIC:monitor:No more Devices configured. All devices are down. Returning OFFLINE
Resource faults at this point.
#6
Agent retry to online interfaces in next monitor.
2012/09/14 14:18:30 VCS WARNING V-16-10001-6004 (solaris) MultiNICA:MultiNIC:monitor:Device FAILED
2012/09/14 14:18:30 VCS WARNING V-16-10001-6005 (solaris) MultiNICA:MultiNIC:monitor:Acquired a WRITE Lock
2012/09/14 14:18:30 VCS WARNING V-16-10001-6006 (solaris) MultiNICA:MultiNIC:monitor:Bringing down IP addresses
2012/09/14 14:18:31 VCS WARNING V-16-10001-6007 (solaris) MultiNICA:MultiNIC:monitor:Trying to online Device bge0
2012/09/14 14:18:32 VCS INFO V-16-10001-6008 (solaris) MultiNICA:MultiNIC:monitor:Sleeping 5 seconds
2012/09/14 14:18:37 VCS WARNING V-16-10001-6010 (solaris) MultiNICA:MultiNIC:monitor:Pinging 192.168.253.1 with Device bge0 configured: iteration 1
2012/09/14 14:18:42 VCS INFO V-16-10001-6008 (solaris) MultiNICA:MultiNIC:monitor:Sleeping 5 seconds
2012/09/14 14:18:47 VCS WARNING V-16-10001-6009 (solaris) MultiNICA:MultiNIC:monitor:Pinging Broadcast address 192.168.255.255 on Device bge0, iteration 2
2012/09/14 14:18:57 VCS WARNING V-16-10001-6011 (solaris) MultiNICA:MultiNIC:monitor:Tried the PingTest 2 times
2012/09/14 14:18:57 VCS WARNING V-16-10001-6012 (solaris) MultiNICA:MultiNIC:monitor:The network did not respond
2012/09/14 14:18:57 VCS WARNING V-16-10001-6013 (solaris) MultiNICA:MultiNIC:monitor:Giving up on NIC bge0
2012/09/14 14:18:57 VCS WARNING V-16-10001-6015 (solaris) MultiNICA:MultiNIC:monitor:Trying the next NIC qfe0 in the list
2012/09/14 14:18:57 VCS WARNING V-16-10001-6007 (solaris) MultiNICA:MultiNIC:monitor:Trying to online Device qfe0
2012/09/14 14:18:58 VCS INFO V-16-10001-6008 (solaris) MultiNICA:MultiNIC:monitor:Sleeping 5 seconds
2012/09/14 14:19:03 VCS WARNING V-16-10001-6010 (solaris) MultiNICA:MultiNIC:monitor:Pinging 192.168.253.1 with Device qfe0 configured: iteration 1
2012/09/14 14:19:04 VCS WARNING V-16-10001-6016 (solaris) MultiNICA:MultiNIC:monitor:Migrated to Device qfe0
2012/09/14 14:19:04 VCS WARNING V-16-10001-6017 (solaris) MultiNICA:MultiNIC:monitor:Releasing Lock
2012/09/14 14:19:05 VCS INFO V-16-1-10299 Resource MultiNIC (Owner: Unspecified, Group: CLUSTERA-SG) is online on solaris (Not initiated by VCS)
Retry in subsequent monitor succeeds because agent had cleared (unplumbed) qfe0 interface during first failed attempt.
My understanding is if you keep only one interface plumbed at any moment agent should take care of failover during cable pull scenario.
Please test following scenario
- Connect cables both all the interfaces – bge0 and qfe0
- Wait until MultiNICA agent detects/brings up base IP address on one of the interface – say bge0
- Make sure another interface – qfe0 is not plumbed on the system
- Perform cable pull test on bge0
- MultiNICA agent should failover base as well as any virtual IP addresses on bge0 to qfe0 interface.
Please let us know if this solves your issue.
Thanks and Regards,
Paresh Bafna
Thanks and Regards,
Paresh Bafna
As I said in earlier post:
and Paresh has confirmed this, so can you test with just one interface plumbed in. Paresh suggests not plumbing any in, but the normal sequence of events is that you plumb in the preferred interface in boot up (i.e suppose you want qfe0 to be be plumbed in then you create a /etc/hostname.qfe0 file) and you leave the other interface unplumbed.
Mike
UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows
If this post has helped you, please vote or mark as solution
@ Mike and Paresh
Fantastic efforts by both of you. I am so thankful. giving you thumbs but confuse whose post mark as solution. request to admin please.
Thanks glee too
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Would you like to reply?
Login or Register to post your comment.