In HPUX 11iv3, multi-threaded vxconfigd device discovery commands like vxdisk scandisks or vxdctl enable could result in a race condition and trigger DMP device tree removal.

Article:TECH200183  |  Created: 2012-11-28  |  Updated: 2013-01-16  |  Article URL http://www.symantec.com/docs/TECH200183
Article Type
Technical Solution

Product(s)

Environment

Problem



When device discovery commands (vxdctl enable or vxdisk scandisks) are run during BCV operations, SAN migration and the like, vxconfigd queries the operating system device tree using HP libIO library (io_init, io_search, io_end etc). Occasionally io_search() returns a NULL string. With VxVM 5.0.1 or higher, DDL/DMP thinks that all the devices have disappeared and removes all the arrays and DMP devices (a.k.a. dmpnode’s) that are visible to the host. This leads to VxVM I/O errors and file systems getting disabled. In cases where VxVM manages the root disk(s), a system hang would result. In a VCS environment, this could trigger monitor timeouts and a possible service group fault. In a HP Serviceguard/SGeRAC environment integrated with CVM and/or CFS, the VxVM I/O failures would typically lead to a Serviceguard INIT and/or a CRS TOC (if the voting disks sit on VxVM volumes).


Error



For VxVM 5.0.1, /etc/vx/dmpevents.log will show the following messages, indicating the removal of dmpnodes and arrays:

Tue Aug 21 02:13:09.000: Reconfiguration is in progress
Tue Aug 21 02:13:09.000: Reconfiguration has finished
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/16
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/32
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/80

..
Tue Aug 21 02:13:07.677: Disabled Disk array emc0
Tue Aug 21 02:13:07.677: Disabled Disk array p2000g3_fc1
Tue Aug 21 02:13:07.687: Disabled Disk array emc1
Tue Aug 21 02:13:07.687: Disabled Disk array p2000g3_fc0
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/2656
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/1776


For VxVM 5.1SP1, /var/adm/vx/ddl.log*, the dmpnode removal is shown with the following messages:

START TIME = Wed Aug 8 14:30:51 2012
ddl_reconfigure_all: Start of original tree
------------ Start of dmp tree -------------
...
------------------------------------------
Found 2620 paths in the dmp tree
------------ End of dmp tree -------------
ddl_reconfigure_all: End of original tree
ddl_reconfigure_all: Start of temporary tree
ddl_reconfigure_all: End of temporary tree
Printing tree after migration is done
---- Start of DMP instruction buffer ----
DESTROY_DMPNODE:
0x3000010 dmpnode is to be destroyed/freed
DESTROY_DMPNODE:
0x3000020 dmpnodDESTROY_DMPNODE:
0x3000d30 dmpnode is to be destroyed/freed
DESTROY_DMPNODE:
0x3000d40 dmpnode is to be is to be destroyed/freed
...
 


Environment



HP-UX 11.31
VxVM 5.0.1 and higher versions.


Cause



As part of device discovery, vxconfigd queries the OS device tree using HP libIO library functions. The libIO library uses the /dev/config driver to access information in the kernel I/O data structures. From the man page of libIO,

io_init()                          Opens the /dev/config device special file, which causes an open(2) of the dev_config driver. io_init() must be called before calling any other routine in the libIO library.

io_end()                         Causes a close(2) of the dev_config driver. io_end() must be called after the use of the libIO library routine(s).

Starting with VxVM versions 5.0+ (namely 5.0, 5.0.1 and 5.1SP1 where multithreaded vxconfigd started using libIO APIs), the use of the thread-unsafe version of the libIO APIs could result in possible race condition in opening and closing /dev/config driver. Specifically, /dev/config opened by one thread (say T1) in an earlier call to io_init() may be closed inadvertently by another thread (say T2) doing io_end() at the same time. When thread T1 proceeds to issue an io_search() after T2 has closed the /dev/config file, the io_search() call would return NULL (a failure) with io_errno set to IO_E_DCONF_OPEN. vxconfigd doesn't detect this error as an io_search() failure but assumes that a NULL returned would mean an empty IO tree.

Even though, the possibility of this race condition exists in 5.0, the dmpnodes are not removed when the issue is hit. In versions 5.0.1 and above, DMP proceeds with deleting all the dmpnode’s and arrays.

Multi-threaded vxconfigd not using thread safe libIOmt library of HP is identified as the root cause for this bug. In other words, this issue has been identified to be with vxconfigd using a non-thread-safe version of the libIO APIs, including io_init(), io_end() and io_search() in a multithreaded context. HP and Symantec are working in collaboration to resolve the issues on priority with best possible solution.


Solution



It has been confirmed that running vxconfigd with multithreading disabled will effectively avoid the issue in a standalone VxVM configuration. In addition, recent testing has revealed that there should be little or no performance impact with multithreading disabled.  A hotfix is available for 5.0.1 and 5.1SP1 releases. For version 6.0+, a workaround is  available. Please contact Symantec Technical Services for the workaround or the hotfix patch.

 


Supplemental Materials

SourceETrack
Value 2977178
Description

vxconfigd multithreading in HP 11.31 causes DMP database deletion as part of device discovery commands.




Article URL http://www.symantec.com/docs/TECH200183


Terms of use for this information are found in Legal Notices