CFS hang due to threads waiting in GLM for taking mdelelock on an extent map.

Article:TECH198905  |  Created: 2012-10-24  |  Updated: 2012-10-24  |  Article URL http://www.symantec.com/docs/TECH198905
Article Type
Technical Solution


Issue



Description:
In CFS a node can lock a mdelelock for an extent map while holding a mdelelock
for
a different extent map locked. This can result in a deadlock between different
nodes in the cluster


Error



#0 [ffff8103525d1600] schedule at ffffffff80062f90
#1 [ffff8103525d16d8] vxg_svar_sleep_unlock at ffffffff88cd2225
#2 [ffff8103525d1728] vxg_grant_sleep at ffffffff88cdfee5
#3 [ffff8103525d1758] vxg_cmn_lock at ffffffff88cd3040
#4 [ffff8103525d17a8] vxg_api_lock at ffffffff88cd3f21
#5 [ffff8103525d17e8] vx_glm_lock at ffffffff884bbdad
#6 [ffff8103525d1808] vx_mdele_hold at ffffffff8847109d
#7 [ffff8103525d1838] vx_extfree1 at ffffffff883fc373
#8 [ffff8103525d18f8] vx_exttrunc at ffffffff8841b7d6
#9 [ffff8103525d1968] vx_trunc_ext4 at ffffffff8841df80
#10 [ffff8103525d1b08] vx_trunc_tran2 at ffffffff884ed3c4
#11 [ffff8103525d1bf8] vx_trunc_tran at ffffffff884ee1ae
#12 [ffff8103525d1c88] vx_cfs_trunc at ffffffff8845e570
#13 [ffff8103525d1d18] vx_trunc at ffffffff884edf84
#14 [ffff8103525d1d68] vx_inactive_remove at ffffffff884e0ed9
#15 [ffff8103525d1de8] vx_inactive_tran at ffffffff884ce075
#16 [ffff8103525d1e58] vx_cinactive_list at ffffffff884455a8
#17 [ffff8103525d1ea8] vx_workitem_process at ffffffff884cc892
#18 [ffff8103525d1eb8] vx_worklist_process at ffffffff884cca54
#19 [ffff8103525d1ef8] vx_worklist_thread at ffffffff884d013b
#20 [ffff8103525d1f08] vx_kthread_init at ffffffff8851c474
#21 [ffff8103525d1f48] kernel_thread at ffffffff8005dfb1

 


Environment



RHEL5 x86-64

 


Cause



The main culprit here is vx_extfree1. It's trying to take an mdele lock on an
emap while it's already holding an mdele lock for another emap. It can end up
doing this for two reasons:

1. The current extent begins in the middle of an AU and ends in the middle of a
different AU.

2. The transaction that is freeing extents is already holding an emap from
freeing a previous extent and the current extent is less than an AU and starts
at the beginning of an AU.

 


Solution



Code has changed to prevent the above deadlock. Please contact support for the hotfix.

 

ftp://ftpvault.veritas.com/release_train/linux/5.1SP1/patch_central/HF/fs/rhel5_x86_64/5.1SP1RP2P2HF1/README


 


Supplemental Materials

SourceETrack
Value2899907


Article URL http://www.symantec.com/docs/TECH198905


Terms of use for this information are found in Legal Notices