VCS MultiNICB resource sees frequent link UP / DOWN messages

Article:TECH190435  |  Created: 2012-06-05  |  Updated: 2013-03-19  |  Article URL http://www.symantec.com/docs/TECH190435
Article Type
Technical Solution


Environment

Issue



Seeing frequent link UP / DOWN messages for VCS MultiNICB resource.


Error



From VCS  engine_A.log:

2012/04/27 01:19:33 VCS INFO V-16-10001-6557 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Up to Down
2012/04/27 01:19:42 VCS INFO V-16-10001-6556 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Down to Up

2012/04/27 18:23:26 VCS INFO V-16-10001-6557 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Up to Down
2012/04/27 18:23:36 VCS INFO V-16-10001-6556 (n933) MultiNICB:Network_MultiNICB:monitor:Device: igb0 went from Down to Up

 

From MultiNICB agent (debug) log:

2012/05/29 03:01:41 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb0 = 100
MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 111
MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS INFO V-16-10001-6557 MultiNICB:Network_MultiNICB:monitor:Device: igb1 went from Up to Down

 


Environment



Solaris 10

VCS 5.1SP1

MultiNICB  resource configured in Base Mode (UseMpathd = 0)

 


Cause



VCS MultiNICB agent uses haping (/opt/VRTSvcs/bin/MultiNICB/haping) to check link health. Haping sends ICMP request packets to NetworkHosts configured and waits for reply for NetworkTimeout interval (default 100 msec). If reply is received within this time period, haping returns 100 i.e. link up else error code.

Based on type of error haping returns different error codes.

In this case we are getting haping return value as 111, which is because of timeout. This could happen for multiple reasons:
- Network host agent is trying to reach take more time to reply and haping timesout.
- Reply gets delayed because of high network traffic.
- Network fluctuations cause request/reply packet drop.
- Network host itself is down, so on.

When haping reports timeout for a specific link, MultiNICB agent retries for OfflineTestRepeatCount (default 3) times before reporting link as down i.e. haping is invoked 3 times for the same interface. If haping returns error for all 3 times then only agent reports link as down.

2012/05/29 03:01:41 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb0 = 100
MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 111
MultiNICB.C:checkStatus[970]
2012/05/29 03:01:42 VCS INFO V-16-10001-6557 MultiNICB:Network_MultiNICB:monitor:Device: igb1 went from Up to Down
2012/05/29 03:01:42 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:956 calling haping on igb1
MultiNICB.C:checkStatus[956]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 111
MultiNICB.C:checkStatus[970]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:956 calling haping on igb1
MultiNICB.C:checkStatus[956]
2012/05/29 03:01:43 VCS DBG_4 V-16-50-0 MultiNICB:Network_MultiNICB:monitor: In checkStatus:970 haping status for igb1 = 100
MultiNICB.C:checkStatus[970]

 

 


Solution



This condition MOSTLY occurs because of network fluctuations or network traffic flood.

This could be confirmed by running haping command and collect the following output. This data needs to be collected when issue is hit i.e. when haping times out (haping return value = 111)


# /opt/VRTSvcs/bin/MultiNICB/haping -v -g <interface>

For example:

# /opt/VRTSvcs/bin/MultiNICB/haping -v -g igb1  

Output for ping to defaultRouter/NetworkHosts configured should also be checked.

# ping -s 10 <NetworkHosts> 10

One possible workaround is to increase NetworkTimeout value from default 100ms to say 1000ms as follows:

# haconf -makerw
# hares -modify Network_MultiNICB  NetworkTimeout 1000
# haconf -makero -dump

Supplemental Materials

SourceETrack
Value2804288
Description

Seeing frequent link UP / DOWN messages for MultiNICB resource



Article URL http://www.symantec.com/docs/TECH190435


Terms of use for this information are found in Legal Notices