Cumulative Hotfix to address all known MSCS related issues when running Storage Foundation for Windows (SFW) 4.3 in a Microsoft Cluster (MSCS) on Windows Server 2003.

Article:TECH73450  |  Created: 2009-01-12  |  Updated: 2013-11-04  |  Article URL http://www.symantec.com/docs/TECH73450
Article Type
Technical Solution


Environment

Issue



Cumulative Hotfix to address all known MSCS related issues when running Storage Foundation for Windows (SFW) 4.3 in a Microsoft Cluster (MSCS) on Windows Server 2003. The issues addressed are faults of the Volume Manager Diskgroup (VMDg) resource (LooksAlive/IsAlive) and unexpected terminations of the MSCS Resource Monitor (Resrcmon).


Solution



A Cumulative Hotfix has been created to resolve all known issues related to Storage Foundation for Windows (SFW) 4.3 running in a Microsoft Cluster (MSCS). Specifically, this addresses several issues related to faults of the Volume Manager Diskgroup (VMDg) resource and unexpected terminations of the MSCS Resource Monitor (Resrcmon).

This private fix addresses issues with communication between SFW and MSCS which results in various issues described below. Note: Maintenance Pack 2 (MP2) must be installed for SFW 4.3 prior to applying this Hotfix. See the 'Related Documents' section below for links to the MP2 downloads (32-bit and 64-bit).

Note: There may be additional symptoms that are resolved by this fix that aren't described below. Any MSCS cluster experiencing Resrcmon crashes, or VMDg resource faults with SFW 4.3 needs to have this Hotfix applied.

Issue 1
The Volume Manager Disk Group (VMDg) resource fails the LooksAlive/IsAlive monitor, resulting in a resource fault in MSCS. Connection between the VMDg resource in a cluster and Volume Manager is lost resulting in a VMDg resource fault with one of the following error(s):

Cluster.log (%clusterlog%)
ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESLooksAlive: *** FAILED for VMDgResourceName, status = 0, res = 3240099962, dg_state = 0

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESLooksAlive: *** FAILED for VMDgResourceName, status = 0, res = 234, dg_state = 33

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESIsAlive: *** FAILED for VMDgResourceName, status = 0, res = 4294967295, dg_state = 0

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESIsAlive: *** FAILED for VMDgResourceName, status = 0, res = 20, dg_state = 0

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESIsAlive: *** FAILED for VMDgResourceName:, status = 0, res = 87, dg_state = 33

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESIsAlive: *** FAILED for VMDgResourceName:, status = 0, res = 24, dg_state = 35

Issue 2
During a path fail-over (Multipathing solution: i.e. EMC PowerPath, IBM SDD, SFW DMP, etc..) the Volume Manager Diskgroup (VMDg) resource faults with the following error in the cluster.log (%clusterlog%):

ERR  Volume Manager Disk Group <VMDgResourceName>: LDM_RESLooksAlive: *** FAILED for VMDgResourceName, status = 0, res = 20, dg_state = 0


NOTE: Issues 3 - 6 (listed below) deal with unexpected terminations of the MSCS Resource Monitor (Resrcmon). In all of these cases, the following WARNING is logged to the Windows System Event Log:

WARNING 1146(0x0000047a) ClusSvc <Server_Name> The cluster resource monitor died unexpectedly, an attempt will be made to restart it.

Issue 3
MSCS Resource Monitor (Resrcmon) terminates unexpectedly. This is caused by a Microsoft serialization issue, and will result in the VMDg resource faulting with the following error reported in the cluster.log (%clusterlog%):

LDM_RESLooksAlive: *** FAILED for <VMDgResourceName>, status = 0, res = 234, dg_state = 33

Stack trace from Resrcmon crash dump will show references to vxres:

0836f970 7c876c0a 00081000 0eb35000 0836f994 ntdll!RtlpDphIsNormalHeapBlock+0x84
0836f9a0 7c876e7c 00081000 00180000 01000002 ntdll!RtlpDphNormalHeapFree+0x21
0836f9f8 7c879d53 00080000 01000002 0eb35000 ntdll!RtlpDebugPageHeapFree+0x146
0836fa60 7c85391a 00080000 01000002 0eb35000 ntdll!RtlDebugFreeHeap+0x2c
0836fb38 7c83e5d0 00080000 01000002 0eb35000 ntdll!RtlFreeHeapSlowly+0x37
0836fc1c 77e62444 00080000 01000002 0eb35000 ntdll!RtlFreeHeap+0x11a
WARNING: Stack unwind information not available. Following frames may be wrong.
0836fc64 31007358 0eb35000 00000000 0724eef0 kernel32!LocalFree+0x2b
0836feac 310062f6 0ce30fb0 0724eef0 0836ffb8 vxres!GetMountVolumeInfo+0x2ac g:\genesis\src\mscs\vsmres\vmrequest.c @ 603
0836ff90 74ef2b90 0724ef44 0724eef0 00000000 vxres!LDM_RESOnlineThread+0x2ad g:\genesis\src\mscs\vsmres\vxres.c @ 1268
0836ffb8 77e64829 1371eff0 00000000 00000000 resutils!ClusWorkerStart+0x27
0836ffec 00000000 74ef2b69 1371eff0 00000000 kernel32!GetModuleHandleA+0xdf

The cluster.log (%clusterlog%) will report the following VMDg resource failure:

ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESLooksAlive: *** FAILED for VMDgResourceName, status = 0, res = 234, dg_state = 33

Issue 4
MSCS Resource Monitor (Resrcmon) terminates unexpectedly due to Disk Group GUID and Disk Group Name initialization failures. The resource monitor crash dump shows the failure on the following call:

LDM_RESResourceControl: CLUSCTL_RESOURCE_STORAGE_GET_DISK_INFO

The cluster.log (%clusterlog%) will report the following:

ERR Volume Manager Disk Group <VMDgResourceName>: Dg-Guid is Null
ERR Volume Manager Disk Group <VMDgResourceName>: LDM_RESOnlineThread: can't get dgid

Issue 5
MSCS Resource Monitor (Resrcmon) terminates unexpectedly due to Heap Corruption. A stack trace from the Resrcmon crash dump will show references to cluscmd. Below is an example where the GetDGReservation status call failed leading to the unexpected termination of Resrcmon.

00000000`02e0fea0 00000000`3100846b : 00000000`00000000 00000000`000d0000 00000000`00000000 00000000`000d6c00 : cluscmd64!GetDgReservationStatusW+0xa7 g:\genesis\src\clitools\cluscmd64\cluscmd64.c @ 491
00000000`02e0ff20 00000000`7898b6da : 00000000`000d6c00 00000000`00000000 00000000`00000000 00000000`31008310 : vxres!vm_req_thread+0x15b g:\genesis\src\mscs\vsmres\vmrequest.c @ 215
00000000`02e0ff80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!

Issue 6
MSCS Resource Monitor (Resrcmon) terminates unexpectedly due to an Access Violation in cluscmd.dll. The stack trace shows the following:

FAULTING_IP:
ntdll!DbgBreakPoint+0

00e6fac0 7c8508fb 00351000 00e6fb70 00e6fb4c ntdll!DbgBreakPoint
00e6fad0 7c87242e 00000006 7c87271c 00351000 ntdll!RtlpPageHeapStop+0x72
00e6fb4c 7c87366d 00351000 00000004 00e6fe90 ntdll!RtlpDphReportCorruptedBlock+0x199
00e6fb7c 7c87382b 00351000 00450000 01001002 ntdll!RtlpDphNormalHeapFree+0x32
00e6fbd4 7c876a21 00350000 01001002 00e6fe90 ntdll!RtlpDebugPageHeapFree+0x146
00e6fc3c 7c86054e 00350000 01001002 00e6fe90 ntdll!RtlDebugFreeHeap+0x1ed
00e6fd14 7c81d6dd 00350000 01001002 00e6fe90 ntdll!RtlFreeHeapSlowly+0x37
00e6fdf8 77bbcef6 00350000 01001002 00e6fe90 ntdll!RtlFreeHeap+0x11a
00e6fe40 2610af96 00e6fe90 001bfe28 026d9320 msvcrt!free+0xc3
00e6fe58 2610d42d 00e6fe90 001bfde4 001bfe28 cluscmd!SetDgGuidParam+0x67 g:\genesis\src\clitools\cluscmd\cluscmd.cpp @ 2376
00e6fedc 31005a7e 026d9320 001bfe28 001bfde4* cluscmd!*ArbitrateDgW+0x6f g:\genesis\src\clitools\cluscmd\cluscmd.cpp @ 1662
00e6ffb8 77e6608b 001bfd60 00000000 00000000 vxres!LDM_RESArbitrateThread+0x197 g:\genesis\src\mscs\vsmres\vxres.c @ 913
00e6ffec 00000000 310058e7 001bfd60 00000000 kernel32!BaseThreadStart+0x34

Solution:
This Hotfix included below addresses all issues described in the 'Details' section above.

The Hotfix addresses these issues by allowing the VMDg resource dll to communicate directly with the low-level vxio driver and will no longer rely on the user-level VEA service to handle the LooksAlive/IsAlive requests which can be affected by various server load/performance issues.

Note: In order to install this Hotfix, SFW 4.3 MP2 must be installed. Please see the Related Documents section below for links to the Maintenance Pack 2 release for SFW 4.3.

Please Download the Hotfix from the link below. Once downloaded and the files extracted, review the included readme_1471983.txt for detailed installation instructions.

This Hotfix contains the following files and their versions:
 
File Name Windows 2003 (x86) Windows 2003 (x64) Windows 2003 (IA64)
cluscmd.dll 4.3.2049.360 4.3.2049.360 4.3.2049.360
vxio.sys 4.3.2049.360 4.3.2049.360 4.3.2049.360
vxres.dll 4.3.2049.360 4.3.2049.360 4.3.2049.360

Attachments

1471983_329717.EXE (1.5 MBytes)

Supplemental Materials

Value1094278
Description

MSCS Resrcmon crashes as a result of DG GUID and DG Name initialization failure.


Value1086531
Description

VMDg resource faults with res = 3240099962, dg_state = 33


Value1395938
Description

The Volume Manager Disk Group (VMDg) resource fails the looksalive/isalive check, resulting in a resource fault in MSCS.


Value1290044
Description

Error 'unable to create MountVolumeInfo' is reported in the MSCS cluster log when attempting to online a VMDg resource.


Value1153216
Description

MSCS Resrcmon crashes with access violation in vxres.dll.


Value1107050
Description

MSCS Resrcmon crashes due to heap corruption.


Value1160045
Description

MSCS resource monitor (Resrcmon) issues - Resource monitor terminates, Heap Corruption, Access Violation


Value1471983
Description

IA64 version of Private Fix 1086531.


Value1532174
Description

VxRes.dll is unable to get the dgid as reported in the cluster.log


Value1240516
Description

Path failover with PowerPath results in VMDG looksalive/isalive failing in an MSCS cluster


Value1046783
Description

MSCS Resource Monitor (Resrcmon) terminates unexpectedly due to access violation in cluscmd.dll



Legacy ID



329717


Article URL http://www.symantec.com/docs/TECH73450


Terms of use for this information are found in Legal Notices