In HPUX 11iv3, multi-threaded vxconfigd device discovery commands like vxdisk scandisks or vxdctl enable could result in a race condition and trigger DMP device tree removal.

Article:TECH200183  |  Created: 2012-11-27  |  Updated: 2014-06-06  |  Article URL http://www.symantec.com/docs/TECH200183
Article Type
Technical Solution

Product(s)

Environment

Issue



When device discovery commands (vxdctl enable or vxdisk scandisks) are run during BCV operations, SAN migration and the like, vxconfigd queries the operating system device tree using HP libIO library (io_init, io_search, io_end etc). Occasionally io_search() returns a NULL string. With VxVM 5.0.1 or higher, DDL/DMP thinks that all the devices have disappeared and removes all the arrays and DMP devices (a.k.a. dmpnode’s) that are visible to the host. This leads to VxVM I/O errors and file systems getting disabled. In cases where VxVM manages the root disk(s), a system hang would result. In a VCS environment, this could trigger monitor timeouts and a possible service group fault. In a HP Serviceguard/SGeRAC environment integrated with CVM and/or CFS, the VxVM I/O failures would typically lead to a Serviceguard INIT and/or a CRS TOC (if the voting disks sit on VxVM volumes).


Error



For VxVM 5.0.1, /etc/vx/dmpevents.log will show the following messages, indicating the removal of dmpnodes and arrays:

Tue Aug 21 02:13:09.000: Reconfiguration is in progress
Tue Aug 21 02:13:09.000: Reconfiguration has finished
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/16
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/32
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/80

..
Tue Aug 21 02:13:07.677: Disabled Disk array emc0
Tue Aug 21 02:13:07.677: Disabled Disk array p2000g3_fc1
Tue Aug 21 02:13:07.687: Disabled Disk array emc1
Tue Aug 21 02:13:07.687: Disabled Disk array p2000g3_fc0
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/2656
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/1776


For VxVM 5.1SP1, /var/adm/vx/ddl.log*, the dmpnode removal is shown with the following messages:

START TIME = Wed Aug 8 14:30:51 2012
ddl_reconfigure_all: Start of original tree
------------ Start of dmp tree -------------
...
------------------------------------------
Found 2620 paths in the dmp tree
------------ End of dmp tree -------------
ddl_reconfigure_all: End of original tree
ddl_reconfigure_all: Start of temporary tree
ddl_reconfigure_all: End of temporary tree
Printing tree after migration is done
---- Start of DMP instruction buffer ----
DESTROY_DMPNODE:
0x3000010 dmpnode is to be destroyed/freed
DESTROY_DMPNODE:
0x3000020 dmpnodDESTROY_DMPNODE:
0x3000d30 dmpnode is to be destroyed/freed
DESTROY_DMPNODE:
0x3000d40 dmpnode is to be is to be destroyed/freed
...
 


Environment



HP-UX 11.31
VxVM 5.0.1 and higher versions.


Cause



As part of device discovery, vxconfigd queries the OS device tree using HP libIO library functions. The libIO library uses the /dev/config driver to access information in the kernel I/O data structures. From the man page of libIO,

io_init()                          Opens the /dev/config device special file, which causes an open(2) of the dev_config driver. io_init() must be called before calling any other routine in the libIO library.

io_end()                         Causes a close(2) of the dev_config driver. io_end() must be called after the use of the libIO library routine(s).

Starting with VxVM versions 5.0+ (namely 5.0, 5.0.1 and 5.1SP1 where multithreaded vxconfigd started using libIO APIs), the use of the thread-unsafe version of the libIO APIs could result in possible race condition in opening and closing /dev/config driver. Specifically, /dev/config opened by one thread (say T1) in an earlier call to io_init() may be closed inadvertently by another thread (say T2) doing io_end() at the same time. When thread T1 proceeds to issue an io_search() after T2 has closed the /dev/config file, the io_search() call would return NULL (a failure) with io_errno set to IO_E_DCONF_OPEN. vxconfigd doesn't detect this error as an io_search() failure but assumes that a NULL returned would mean an empty IO tree.

Even though, the possibility of this race condition exists in 5.0, the dmpnodes are not removed when the issue is hit. In versions 5.0.1 and above, DMP proceeds with deleting all the dmpnode’s and arrays.

Multi-threaded vxconfigd not using thread safe libIOmt library of HP is identified as the root cause for this bug. In other words, this issue has been identified to be with vxconfigd using a non-thread-safe version of the libIO APIs, including io_init(), io_end() and io_search() in a multithreaded context. HP and Symantec are working in collaboration to resolve the issues on priority with best possible solution.


Solution



**Update  3/13/2014**

Final resolution for VxVM 5.0.1 and VxVM 5.1SP1 is now released at both Symantec and HP. 

Here are the 4 main components of the fix – ALL components need to be installed for the complete solution.

 

 Symantec Components:

1.       VxVM patches :

 

For release 5.1SP1 :

PHCO_43824 - 11.31 VRTS 5.1 SP1RP3P1 VRTSvxvm Command Patch and

PHKL_43779 - 11.31 VRTS 5.1 SP1RP3P1 VRTSvxvm Kernel Patch

https://sort.symantec.com/patch/detail/8419

 

 

For release 5.0.1 :

PHCO_43579 -  11.31 VRTS 5.0.1 RP3P5 VRTSvxvm Command Patch

PHKL_43580  -   11.31 VRTS 5.0.1 RP3P5 VRTSvxvm Kernel Patch

https://sort.symantec.com/patch/detail/8274

 

 

2.       VRTSaslapm 5.1.103.100 or above

 

Note: this applies only to 5.1 SP1

https://sort.symantec.com/asl/latest

 

Please note that in 5.1SP1 environments where Clariion arrays are present it is extremely important that the latest VRTSaslapm package is installed, especially when the 5.1SP1RP3 and 5.1SP1RP3P1 VxVM patches are installed. This is to avoid any unnecesssary vxconfigd coredumps, DPCA situations and/or vxconfigd going into a non-running state as a result of the ATYPE changes implemented in the VxVM patches and VRTSaslapm package. By not upgrading to the latest VRTSaslapm package with these version/patchlevels, an incompatibility could arise leading to the aforementioned situations. As VxVM 5.0.1 ships with an embedded VRTSaslapm, this concern regarding Clariion arrays will not arise on this version.

 

 

 

HP Components :

3.        Thread-safe libIO(3X) APIs

 This  library, namely libIOmt, was  introduced with HP patch PHCO_38066 .

 

4.       Thread-safe SNIA libraries for I/O drivers on HP-UX 11.31

These drivers are required by vxesd(1M) and have been delivered with the latest I/O driver bundles in November 2013.  The GVSD driver for HPVM guests will be shipped in the March 2014 HP-UX Release Update.

 

This  component is required (but not enforced) for 5.0.1 as well.

 The HP components are available for download at itrc.hp.com by following these quick links :

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=CommonIO

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=FibrChanl-01

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=FibrChanl-02

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=FibrChanl-03

 

 

NOTE:  The final fix for VxVM 6.x releases is not yet available.  If you are running VxVM 6.x please continue to use the workaround of  running vxconfigd in single threaded mode

to avoid the race condition.  This technote will be updated once the fix for VxVM 6.x is released.

***  End update 03/13/2014 ***

 

Previous updates

** Update 10/16/2013 **

 

Symantec has now released a remedial fix for this issue in version 5.1SP1RP3 via the following patches:

 

·        PHCO_43526 - 11.31 VRTS 5.1 SP1RP3 VRTSvxvm Command Patch
·        PHKL_43527 - 11.31 VRTS 5.1 SP1RP3 VRTSvxvm Kernel Patch
 
These patches can be downloaded from our SORT page at :
 
While this patch is not the complete multi-threaded fix that both HP and Symc are working on, it provides customers the opportunity to run vxconfigd in multi-threaded mode without the risk of an outage when the race condition is encountered.  Specifically, if the race is encountered, the dmp tree is retained and messages are logged to syslog alerting the administrator that the device discovery has failed and requesting a manual restart of vxconfigd.   Here is an example of a message you can expect to see in syslog :
 
 
"Failed to obtain the OS device information. Restoring back to old configuration. DMP database may not be up to date. Please restart vxconfigd as soon as possible to resolve potential discrepancy between existing known configuration and actual configuration."
 
 
This remedy avoids the deletion of the dmp tree and also allows for vxconfigd to run in multi-threaded mode.    Vxconfigd must be restarted by the administrator manually at this point.
 This fix is ONLY available for 5.1SP1 at this time.  If a customer has the workaround installed via hotfix UNOF_5.1SP1RP2HF1, they must remove this prior to installing 5.1SP1RP3. 

 Customers running 5.0.1 or 6.0 must still continue to use the “nothreads” workaround to avoid the outage until the final fix is available

** End update 10/16/2013

Original content :

It has been confirmed that running vxconfigd with multithreading disabled will effectively avoid the issue in a standalone VxVM configuration. In addition, recent testing has revealed that there should be little or no performance impact with multithreading disabled.  A hotfix is available for 5.0.1 and 5.1SP1 releases. For version 6.0+, a workaround is  available. Please contact Symantec Technical Services for the workaround or the hotfix patch.

 


Supplemental Materials

SourceETrack
Value 2977178
Description

vxconfigd multithreading in HP 11.31 causes DMP database deletion as part of device discovery commands.




Article URL http://www.symantec.com/docs/TECH200183


Terms of use for this information are found in Legal Notices