AIX: SRDF-R2 devices in read-write mode can cause diskgroup import failures with tunable dmp_cache_open=ON

Article:TECH156258  |  Created: 2011-03-22  |  Updated: 2013-10-17  |  Article URL http://www.symantec.com/docs/TECH156258
NOTE: If you are experiencing this particular known issue, we recommend that you Subscribe to receive email notification each time this article is updated. Subscribers will be the first to learn about any releases, status changes, workarounds or decisions made.
Article Type
Technical Solution

Product(s)

Issue



 SRDF-R2 devices upon transitioning to read/write mode from write-disabled, contribute to diskgroup import failures with dmp_cache_open=ON.

 


Error



 After installing VxVM 51_SP1_RP1 - the symptoms and errors  are as follows:

From errpt_a :

Detail Data
DESCRIPTION
WARNING VxVM vxio V-5-3-0 voldio: Disk hdisk10 is write-protected, disallow write <<<<< NOTE
---------------------------------------------------------------------------
 

 more vxdisk_list
DEVICE       TYPE            DISK         GROUP        STATUS
hdisk0       auto:LVM        -            -            LVM
hdisk2       auto:aixdisk    oraappdg0101  oraappdg01   online
hdisk3       auto:aixdisk    oraappdg0102  oraappdg01   online
hdisk6       auto:aixdisk    oraappdg0103  oraappdg01   online
hdisk10      auto:aixdisk    -            -            online                           <<<<<<<<<<<NOTE

 

OUTPUT from vxdisk_list_devicename:


devicetag: hdisk10
type:      auto
hostid:
disk:      name= id=1294938151.91.oraqpdb01
group:     name=HA_dataraw01 id=1294938164.95.oraqpdb01
info:      format=aixdisk,privoffset=256
flags:     online ready private autoconfig
pubpaths:  block=/dev/vx/dmp/hdisk10 char=/dev/vx/rdmp/hdisk10
guid:      -
udid:      EMC%5FSYMMETRIX%5F000192601936%5F3600535000
site:      -
version:   2.1
iosize:    min=512 (bytes) max=512 (blocks)
public:    slice=0 offset=66048 len=105866562 disk_offset=0
private:   slice=0 offset=256 len=65536 disk_offset=0
update:    time=1300727495 seqno=0.253
ssb:       actual_seqno=0.0
headers:   0 248
configs:   count=1 len=48346
logs:      count=1 len=7325
Defined regions:
 config   priv 000017-000247[000231]: copy=01 offset=000000 enabled
 config   priv 000249-048363[048115]: copy=01 offset=000231 enabled
 log      priv 048364-055688[007325]: copy=01 offset=000000 enabled
Annotations:
 tag      udid_asl=EMC%5FSYMMETRIX%5F000192601685%5F85013FE008
Multipathing information:
numpaths:   4
 

 

Snippet from engine_log: ( diskgroup still cannot be imported successfully)

2011/03/21 11:20:49 VCS WARNING V-16-10011-715 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:Diskgroups will be imported without reservations
2011/03/21 11:20:53 VCS WARNING V-16-10011-702 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:vxdg import (clear flag) failed. Trying force import
2011/03/21 11:20:53 VCS ERROR V-16-10011-703 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import (force) failed on Disk Group HA_tools01
2011/03/21 11:20:57 VCS ERROR V-16-10011-705 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import failed on Disk Group HA_tools01 after vxdctl enable
 

 

2012/06/26 17:39:58 VCS INFO V-16-2-13716 (rdgpow5aix02) Resource(DG_TEST_RES): Output of the completed operation (online)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group

 

Or, if they are using fencing, they will see this error when VCS attempts to write the keys to the LUN;

2012/04/10 23:18:50 VCS INFO V-16-2-13716 ( rdgpow5aix02 ) Resource( DG_TEST_RES ): Output of the completed operation (actions)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed

You can differentiate this from other SCSI-3 PR key failures by putting vxconfigd into debug mode. In the debug logging, you will see an error similar to the following;

prdev_open(/dev/vx/rdmp/emc0001_123d): open failure: 47

The "open failure: 47" message indicates write-disabled media.

 


Environment



Configurations involving : AIX , EMC SRDF devices with Volume Manager versions  5.1_SP1_RP1 are susceptible to this problem
 


Cause



SRDF-R2 devices in read-write mode , with dmp_cache_open=ON , contribute to disk group import failures. The behavior can be explained as follows:

[About Volume Manager tunable 'dmp_cache_open': If this parameter is set to on , the first open of a device that is performed by an array support library(ASL) is cached. This caching enhances the performance of device discovery by minimizing the overhead that is caused by susbsequnet opens by ASLs. If this parameter is set to OFF, caching is not performed. The default value is on]

With dmp_cache_open set to ON by default, when an application (or vxconfigd) issues an open on a sub-path - the open is issued only once and cached. Subsequent opens use the cached entry and just increment the reference count.

Originally the SRDF-R2 devices are in write-disabled mode. With dmp_cache_open=ON , the read-only (or write-disabled)  mode is cached. Thus, device opens in read-write mode are dis-allowed. During a diskgroup import - configuration copy updates require the device open in write mode to be successful. As the device open in write mode is unsuccessful  - it results in diskgroup import failure.

After installing 51_SP1_RP1 - the devices are in "ONLINE" state  . However, due to the behaviour explained above - diskgroup import failures occur with 51_SP1_RP1.


Solution



In a Veritas Cluster Server (VCS) environment, a fix is now available via the latest SRDF agent (Q2 2012 aka 5.0.14.0). This version of the agent will automatically run 'vxdisk rm <daname>' on SRDF LUNs when their device state changes. This clears the dmp open cache specifically on the SRDF LUNs. Non-SRDF LUNs are left alone, and still benefit from dmp open caching.

There is a known issue in the current version of the agent where the 'vxdisk rm <daname>' command only happens when the attribute SwapRoles is enabled. If SwapRoles is turned off, then the workaround never occurs. This is scheduled to be fixed in an upcoming version of the agent.

Workarounds are:

In  SRDF environment dmp_cache_open can be turned off/on briefly after the SRDF R2 to R1 transition and prior to diskgroup import.  And turned back on after the import is completed.
 
1) Before failover, disable dmp_cache_open and re-enable after failback
 
 OR
 
Prior to importing the diskgroup consisting of SRDF devices the following script/sequence of commands can be executed:
 
# for d in `vxdisk -e list | grep srdf-r2 | awk '{ print $1 }'` ; do vxdisk rm $d ; done ; vxdisk scandisks
 
 NOTE: ‘vxdisk rm <DA>’ basically close the paths completely and doesn’t keep any cached open even when dmp_cache_open is enabled

 

TO  TURN  OFF dmp_cache_open:

DISABLE dmp_cache_open:

# vxdmpadm gettune all | grep cache
dmp_cache_open                           on               on
# vxdmpadm settune dmp_cache_open=off
Tunable value will be changed immediately
 

Check if the change is in effect :

# vxdmpadm gettune all | grep cache
dmp_cache_open                          off               on

 An entry in the /etc/vx/dmppolicy.info will make the tunable persistent

# cat /etc/vx/dmppolicy.info
arraytype
#
arrayname
#
enclosure
#
Tunables
dmp_cache_open=off

 

Another workaround if the vxscan disk takes a long time is:

======================================

#!/usr/bin/sh
set -x


disk_list=`/usr/sbin/vxdisk -e list | grep -i srdf-r2 | awk '{ print $1 }'`
for disk in $disk_list
do
        /usr/sbin/vxdisk offline $disk
        /usr/sbin/vxdisk online $disk
done
=====================================


 

Refer to TECH157574 for related issue


Supplemental Materials

SourceETrack
Value 2334711
Description

 

SRDF Agent needs modification to prevent dg import failures due to R2 devices' cached information mode being write-disabled





Article URL http://www.symantec.com/docs/TECH156258


Terms of use for this information are found in Legal Notices