Unresponsive system (hang) or possible data loss due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0, or 5.0 MP1

Article:TECH54291  |  Created: 2008-01-12  |  Updated: 2008-01-12  |  Article URL http://www.symantec.com/docs/TECH54291
Article Type
Technical Solution

Product(s)

Environment

Issue



Unresponsive system (hang) or possible data loss due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0, or 5.0 MP1

Error



qlc: [ID 262021 kern.warning] WARNING: qlc(0): isr, Internal Parity/Pause Error - hccr=0h, stat=428113h, count=710644882
-OR-
WARNING: /pci@3,700000/SUNW,qlc@0,1/fp@0,0/ssd@w50060e800327572c,5a (ssd138): undecodable sense information: 0x0 0x0 0x0 0x1 0x0 0x0 0x0 0x2 0xff 0xff 0xff 0xff 0x13 0x7c 0

Solution



This issue only applies if you have:
Qlogic 2G or 4G host bus adapters (HBAs) on Solaris 8, 9, or 10 with one of the following releases of Veritas Volume Manager (VxVM):
Veritas Volume Manager 4.1 MP1 with patch 122059-02
Veritas Volume Manager 4.1 MP2 or later
Veritas Volume Manager 5.0 or later

Detailed Description:
Due to an issue in DMP Fast Recovery procedures,  interaction between Qlogic 2G and 4G HBAs and VxVM may cause Solaris systems to become unresponsive (hang) under heavy load conditions during dynamic multipathing (DMP) Fast Recovery IO failure analysis.  
DMP Fast Recovery was introduced in the 5.0 release and back-ported to the 4.1 release through Patch 122059-02 as well as the Maintenance Patch (MP2) patchset (117080-07).
DMP Fast Recovery functionality greatly enhances IO failure analysis by communicating directly with the HBA driver, bypassing the SCSI disk (SD) driver which handles normal IO traffic.  By communicating directly with the HBA, failure analysis can be conducted much more efficiently without suffering through backlogged SD driver queues that typically accompany IO path failures during heavy load.
Incident e1123248 documents two defects:
  • Incorrect Command Descriptor Block (CDB) tagging.
  • Failure to reset b_resid (number of bytes not transferred) back to zero upon subsequent attempts to resubmit a given IO.
The resulting behavior is HBA driver specific.  Only Qlogic 2G or 4G HBAs have been found to exhibit this adverse behavior.

Resolution for 4.1 MP (x):
A binary hot fix is available for 4.1 MP2 to fix this issue.  The 4.1 MP2 patch (117080-07) is a prerequisite for the binary hot fix.
A patchadd patch (128045-01) is available for VxVM 4.1 MP2 RP2.  The 4.1 MP2 (117080-07) plus RP2 (124358-04) patches are prerequisites for this patchadd solution.
Please contact Symantec Support to obtain either of these patches, referencing this TechNote 292445.

Availability of the DMP_Fast_Recovery tunable on 4.1 MP2:

The 4.1 MP2 Release Notes  http://support.veritas.com/docs/287682 describe a tunable to disable the DMP Fast Recovery functionality:
"The dmp_fast_recovery tunable controls whether DMP should attempt to obtain SCSI error information directly from the HBA interface. Setting the value to on can potentially provide faster error recovery, provided that the HBA interface supports the error enquiry feature. If set to off, the HBA interface is not used. The default setting is off. Before enabling this tunable, make sure the HBA firmware level is supported in the HCL. Enabling this tunable with unsupported HBA firmware levels may result in a system panic."
There are three discrepancies in that quote from the MP2 Release Notes:
  • While the DMP Fast Recovery feature was included in 4.1 MP2, the dmp_fast_recovery tunable was not exposed.
  • DMP Fast Recovery is "on" by default.
  • The last two sentences referring to HBA firmware and risk of a system panic actually apply to the 'monitor_fabric' tunable. This tunable is 'off' by default in 4.1 MP2,  specifically to protect users against those risks.
In addition to repairing the two defects outlined above, Incident e1123248 also exposes the dmp_fast_recovery tunable as documented in the 4.1 MP2 Release Notes. The 5.0 release does include this tunable by default. The default value for this tunable remains "on" for both releases.


Resolution for 5.x:

The next rolling patch for 5.0 MP1 will include a permanent fix for these issues. This patch is tentatively scheduled to be released by end of January (08). Until the release of this patch, you can disable DMP Fast Recovery as a temporary workaround as described below:



Workaround for 5.x:

1. Install VxVM 5.0 MP1:

 http://support.veritas.com/docs/288505

2. Set dmp_fast_recovery=off:

  root# vxdmpadm gettune all |grep fast_recovery
  dmp_fast_recovery              on               on
  root#
  root# vxdmpadm settune dmp_fast_recovery=off
  Tunable value will be changed immediately
  root#
  root# vxdmpadm gettune all |grep fast_recovery
  dmp_fast_recovery             off               on
  root#
  root# cat /etc/vx/dmppolicy.info
  arraytype
  #
  arrayname
  #
  enclosure
  #
  Tunables
  dmp_fast_recovery=off
  #
  root#


Supplemental Materials

SourceETrack
Value1123248
Descriptiondmp_fast_recovery defects affecting Qlogic HBAs


Legacy ID



292445


Article URL http://www.symantec.com/docs/TECH54291


Terms of use for this information are found in Legal Notices