Video Screencast Help

SF4.1 VxDMP disables dmpnode on single path failure

Created: 12 Jun 2012 • Updated: 14 Jun 2012 | 4 comments
This issue has been solved. See solution.

This is more like an informational question, since I do not assume anyone has a solution, but just in case I would be thankfull for some enlightment:

I am forced to use an old Version SF4.1MP4 in this case on Linux SLES9. For whatever reason DMP does not work with the JBOD I have added. The JBOD (Promise VTrak 610fD) is ALUA, so half of all the available paths are always standy and that is ok. But the DMP in 4.1 when seeing one of 4 paths not working diables the whole DMP Node, rendering the disk unusable:

Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-148 i/o error occured on path 8/0x70 belonging to dmpnode 201/0x10<5>VxVM vxdmp V-5-0-148 i/o error anal
ysis done on path 8/0x70 belonging to dmpnode 201/0x10<5>VxVM vxdmp V-5-0-0 SCSI error opcode=0x28 returned status=0x1 key=0x2 asc=0x4 ascq=0xb on path
8/0x70
Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-112 disabled path 8/0x50 belonging to the dmpnode 201/0x10
Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-112 disabled path 8/0x70 belonging to the dmpnode 201/0x10
Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-112 disabled path 8/0x10 belonging to the dmpnode 201/0x10
Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-112 disabled path 8/0x30 belonging to the dmpnode 201/0x10
Jun 12 15:35:01 kernel: VxVM vxdmp V-5-0-111 disabled dmpnode 201/0x10
Jun 12 15:35:01 kernel: Buffer I/O error on device VxDMP2, logical block 0
 

Currently my only solutions seems to stick with Linux DM-Multipathing an add the disks as foreign devices.

Comments 4 CommentsJump to latest comment

AlanTLR's picture

Ursi,

  I'm not too familiar to with 4.1 (we have 3.5 and 5.1 here), but it sounds like one of the policies may not be set to where you want it.  You can set up the I/O policy to specify how to balance a load across the paths.  By the way you describe the behavior, it would seem that (and I'm just guessing) that it's trying to do active/active, and when it sees one down, it assumes all are down.  Have you tried different IO Policies?  If not, maybe try starting with singleactive and work up from there to see if you get different behavior.

 

--Alan

ursi's picture

Hi Alan,

thanks for the hint. I already tried singleactive and all the other policies. I even tried adding the array as a/p and setting some paths into standby. Same stuff all the time.

I already assumed it might have something to do with Subpath Failover Groups (SFG), so that DMP maybe misconsepts the fabric paths but the tunable (dmp_sfg_threshold) is not in SF4.1 or at least not visible.

But please do not be bothered to much with that -- I just in this very moment switched back to dm-multipath :)

\ursi

Gaurav Sangamnerkar's picture

Hello Ursi,

I have a littile different opinion .... two things..

1. when you said you have tried other IOPolicies, u tried other than prefferred or singleactive ? in both the cases chances of occurring above scenario may be higher ... did you try using iopolicy as minimumq ?

This policy sends I/O on paths that have the minimum number of outstanding I/O requests in the queue for a LUN. This is suitable for low-end disks or JBODs where a significant track cache does not exist

2. Above scene may happen if the IO was writing something on to the private region & since IO was broken, the private region may be inconsistent which resulted in disabling the entire dmpnode ... are you sure that this was not the case ? does this happen everytime or was just one off occurrence ?

 

G

 

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

ursi's picture

Hi G,

yes I even did a retry since I assumed I had overlooked something but it is a steady bug, not transient. The VxVM had not even a chance to write to the private region or even initialize the disks since VxDMP does a path probe before writing (or does probing on idle LUNs anyway) and as soon as it stepped on those inactice ALUA paths it disabled the whole dmpnode.

So I had no chance to use any of the LUNs at all. A complete failure.

But I do now run with dm-multipath and use the LUNs as foreign disks and it works like a charm. Since we are in production on Friday I will keep it like this:

riser5:~ # vxdisk list isar2_sas_2
Device:    isar2_sas_2
devicetag: isar2_sas_2
type:      simple
hostid:    bla
disk:      name=disk id=1339397620.4.bla
group:     name=xxxdg id=1339445883.17.bla
flags:     online ready private foreign autoimport imported
pubpaths:  block=/dev/disk/by-name/isar2_sas_2 char=/dev/disk/by-name/isar2_sas_2
version:   2.1
iosize:    min=512 (bytes) max=1024 (blocks)
public:    slice=0 offset=2049 len=33552383 disk_offset=0
private:   slice=0 offset=1 len=2048 disk_offset=0
update:    time=1339535796 seqno=0.15
ssb:       actual_seqno=0.0
headers:   0 248
configs:   count=1 len=1481
logs:      count=1 len=224
Defined regions:
 config   priv 000017-000247[000231]: copy=01 offset=000000 enabled
 config   priv 000249-001498[001250]: copy=01 offset=000231 enabled
 log      priv 001499-001722[000224]: copy=01 offset=000000 enabled
 

# multipath -ll
isar2_sas_2 dm-1 Promise,VTrak E610f
[size=16G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=100][enabled]
 \_ 0:0:0:12 sdb 8:16  [active][ready]
 \_ 1:0:0:12 sdf 8:80  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 0:0:1:12 sdd 8:48  [active][ready]
 \_ 1:0:1:12 sdh 8:112 [active][ready]
isar2_test_1 dm-0 Promise,VTrak E610f
[size=1.0G][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=100][enabled]
 \_ 0:0:0:9  sda 8:0   [active][ready]
 \_ 1:0:0:9  sde 8:64  [active][ready]
\_ round-robin 0 [prio=2][enabled]
 \_ 0:0:1:9  sdc 8:32  [active][ready]
 \_ 1:0:1:9  sdg 8:96  [active][ready]
 

 

And of course thank you for your effort!

Ursi

SOLUTION