On a system running AIX operating system, monitor of VCS IP resource returns online even after offlining the resource and failing over to the second node.

Article:TECH197358  |  Created: 2012-09-26  |  Updated: 2012-09-26  |  Article URL http://www.symantec.com/docs/TECH197358
Article Type
Technical Solution

Product(s)

Environment

Issue



On a system running AIX operating system (OS), monitor of Verias Cluster Service (VCS) IP resource returns online even after offlining the resource and failing over to the second node. This causes concurrency violation on the service group.


Error



From engine_A.log:

==> IP resource is being brought offline on Node1:

2012/08/01 01:08:17 VCS ERROR V-16-2-13067 (Node1) Agent is calling clean for resource(rvg_ip_1) because the resource became OFFLINE unexpectedly, on its own.
2012/08/01 01:08:18 VCS WARNING V-16-10011-3304 (Node1) IP:rvg_ip_1:clean:The value of NetMask attribute and netmask configured for interface [en9] does not match.
2012/08/01 01:08:18 VCS INFO V-16-2-13068 (Node1) Resource(rvg_ip_1) - clean completed successfully.
2012/08/01 01:08:18 VCS INFO V-16-1-10307 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (Not initiated by VCS)
2012/08/01 01:08:18 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rvg_logowner_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node1
2012/08/01 01:08:19 VCS INFO V-16-6-15015 (Node1) hatrigger:/opt/VRTSvcs/bin/triggers/resfault is not a trigger scripts directory or can not be executed
2012/08/01 01:08:21 VCS INFO V-16-1-10305 Resource rvg_logowner_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (VCS initiated)
2012/08/01 01:08:21 VCS ERROR V-16-1-10205 Group RVG_LOGOWNER is faulted on system Node1
2012/08/01 01:08:21 VCS NOTICE V-16-1-10446 Group RVG_LOGOWNER is offline on system Node1


==> The resource is then brought online by VCS on Node2:

2012/08/01 01:08:21 VCS NOTICE V-16-1-10301 Initiating Online of Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node2
2012/08/01 01:08:21 VCS INFO V-16-6-15002 (Node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline Node1 RVG_LOGOWNER   successfully
2012/08/01 01:08:26 VCS INFO V-16-10011-0 (Node2) IP:rvg_ip_1:online:tcpdump is running with pid [66912374].
2012/08/01 01:08:27 VCS INFO V-16-1-10298 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is online on Node2 (VCS initiated)


==> The resource is then detected online by VCS on Node1 causing concurrency violation of the service group:

2012/08/01 01:18:19 VCS INFO V-16-1-10299 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is online on Node1 (Not initiated by VCS)
2012/08/01 01:18:19 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group RVG_LOGOWNER


==> Later on, when the resource is manually brought offline on Node1, an error is noticed from ifconfig command:

2012/08/01 01:53:41 VCS NOTICE V-16-1-10167 Initiating manual offline of group RVG_LOGOWNER on system Node1
2012/08/01 01:53:41 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node1
2012/08/01 01:53:41 VCS INFO V-16-6-15002 (Node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation Node1 RVG_LOGOWNER  successfully
2012/08/01 01:53:41 VCS WARNING V-16-10011-3304 (Node1) IP:rvg_ip_1:offline:The value of NetMask attribute and netmask configured for interface [en9] does not match.
2012/08/01 01:53:43 VCS INFO V-16-2-13716 (Node1) Resource(rvg_ip_1): Output of the completed operation (offline)
==============================================
ifconfig: ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address
==============================================

2012/08/01 01:53:43 VCS ERROR V-16-2-13064 (Node1) Agent is calling clean for resource(rvg_ip_1) because the resource is up even after offline completed.
2012/08/01 01:53:44 VCS INFO V-16-2-13068 (Node1) Resource(rvg_ip_1) - clean completed successfully.
2012/08/01 01:53:44 VCS INFO V-16-1-10305 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (VCS initiated)

 


Environment



This issue is only applicable to systems running VCS on AIX operation system.


Cause



VCS IP agent does a ping test to the assigned IP address to verify if the resource is online or not. Further debugging confirmed that the ping returns successfully even when the IP address is unplumbed. The reply to the ping for an IP was coming from another IP address. The problem was reproducible even after cleaning out the ARP as suggested by IBM. This issue was reproduced even without VCS in picture and is related to AIX operating system (OS).


Solution



The problem is related to AIX operating system where the reply to the ping for an IP returns successful from another IP address. Customer need to contact IBM to resolve this issue. From VCS end, it is recommended to increase the OnlineRetryLimit to 1 for the IP agent (if it is set to the default value of 0).

# hatype -modify IP OnlineRetryLimit 1

 


Supplemental Materials

SourceETrack
Value2869021
Description

Need assistance with VCS configuration to help alleviate performance issue



Article URL http://www.symantec.com/docs/TECH197358


Terms of use for this information are found in Legal Notices