HOWTO: Recover from a serial split brain
| Article:TECH33020 | | | Created: 2009-01-28 | | | Updated: 2010-11-28 | | | Article URL http://www.symantec.com/docs/TECH33020 |
Problem
HOWTO: Recover from a serial split brain
Error
VxVM vxdg ERROR V-5-1-10127 associating disk-media
Serial Split Brain detected. Run vxsplitlines
Solution
Background:
The Serial Split Brain condition arises because VERITAS Volume Manager (tm) increments the serial ID in the disk media record of each imported disk in all the disk group configurations on those disks. A new serial (SSB) ID has been included as part of the new disk group version=110 in Volume Manager 4 to assist with recovery of the disk group from this condition. The value that is stored in the configuration database represents the serial ID that the disk group expects a disk to have. The serial ID that is stored in a disk's private region is considered to be its actual value.
If some disks went missing from the disk group (due to physical disconnection or power failure) and those disks were imported by another host, the serial IDs for the disks in their copies of the configuration database, and also in each disk's private region, are updated separately on that host. When the disks are subsequently reimported into the original shared disk group, the actual serial IDs on the disks do not agree with the expected values from the configuration copies on other disks in the disk group.
The disk group cannot be reimported because the databases do not agree on the actual and expected serial IDs. You must choose which configuration database to use. This is a true serial split brain condition, which Volume Manager cannot correct automatically. In this case, the disk group import fails, and the vxdg utility outputs error messages similar to the following before exiting:
VxVM vxconfigd NOTICE V-5-0-33 Split Brain. da id is 0.1, while dm id is 0.0 for DM <dg name> VxVM vxdg ERROR V-5-1-587 Disk group <dg name>: import failed: Serial Split Brain detected. Run vxsplitlines
The import does not succeed even if you specify the -f flag to vxdg.
Although it is usually possible to resolve this conflict by choosing the version of the configuration database with the highest valued configuration ID (shown as config_tid in the output from the vxprivutil dumpconfig <device>), this may not be the correct thing to do in all circumstances.
To resolve conflicting configuration information, you must decide which disk contains the correct version of the disk group configuration database. To assist you in doing this, you can run the vxsplitlines command to show the actual serial ID on each disk in the disk group and the serial ID that was expected from the configuration database. For each disk, the command also shows the vxdg command that you must run to select the configuration database copy on that disk as being the definitive copy to use for importing the disk group.
The following example shows the result of JBOD losing access to one of the four disks in the disk group:
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
- - d1 dgD280silo1 failed was:c2t1d0s2
# vxreattach -c c2t1d0s2
dgD280silo1 d1
# vxreattach -br c2t1d0s2
VxVM vxdg ERROR V-5-1-10127 associating disk-media d1 with c2t1d0s2:
Serial Split Brain detected. Run vxsplitlines
# vxsplitlines -g dgD280silo1
VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c2t1d0s2 d1
To see the configuration copy from this disk issue /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/c2t1d0s2
To import the diskgroup with config copy from this disk use the following command;
# /usr/sbin/vxdg -o selectcp=1092974296.21.gopal import dgD280silo1
The following are the disks whose serial split brain (SSB) IDs don't match in this configuration copy:
d2
At this stage, you need to gain confidence prior to running the recommended command by generating the following outputs :
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
- - d1 dgD280silo1 failed was:c2t1d0s2
# vxreattach -c c2t1d0s2
dgD280silo1 d1
# vxreattach -br c2t1d0s2
VxVM vxdg ERROR V-5-1-10127 associating disk-media d1 with c2t1d0s2:
Serial Split Brain detected. Run vxsplitlines
# vxsplitlines -g dgD280silo1
VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c2t1d0s2 d1
To see the configuration copy from this disk issue /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/c2t1d0s2
To import the diskgroup with config copy from this disk use the following command;
# /usr/sbin/vxdg -o selectcp=1092974296.21.gopal import dgD280silo1
The following are the disks whose serial split brain (SSB) IDs don't match in this configuration copy:
d2
At this stage, you need to gain confidence prior to running the recommended command by generating the following outputs :
In this example, the disk group split so that one disk (d1) appears to be on one side of the split. You can specify the -c option to vxsplitlines to print detailed information about each of the disk IDs from the configuration copy on a disk specified by its disk access name:
# vxsplitlines -g dgD280silo1 -c c2t3d0s2
VxVM vxsplitlines INFO V-5-2-2701 DANAME(DMNAME) || Actual SSB || Expected SSB
VxVM vxsplitlines INFO V-5-2-2700 c2t1d0s2( d1 ) || 0.0 || 0.0 ssb ids match
VxVM vxsplitlines INFO V-5-2-2700 c2t2d0s2( d2 ) || 0.1 || 0.0 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2700 c2t3d0s2( d3 ) || 0.1 || 0.0 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2700 c2t9d0s2( d4 ) || 0.1 || 0.0 ssb ids don't match
VxVM vxsplitlines INFO V-5-2-2706
This output can be verified by using vxdisk list on each disk. A summary is shown below:
|
# vxdisk list c2t1d0s2
|
# vxdisk list c2t3d0s2
|
|
Device: c2t1d0s2
|
Device: c2t3d0s2
|
|
disk: name= id=1092974296.21.gopal
|
disk: name=d3 id=1092974311.23.gopal
|
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
|
ssb: actual_seqno=0.0
|
ssb: actual_seqno=0.1
|
|
# vxdisk list c2t2d0s2
|
# vxdisk list c2t9d0s2
|
|
Device: c2t2d0s2
|
Device: c2t9d0s2
|
|
disk: name=d2 id=1092974302.22.gopal
|
disk: name=d4 id=1092974318.24.gopal
|
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
group: name=dgD280silo1 id=1095738111.20.gopal
|
|
ssb: actual_seqno=0.1
|
ssb: actual_seqno=0.1
|
Note that though some disks SSB IDs might match that does not necessarily mean that those disks' config copies have all the changes. From some other configuration copies, those disks' SSB IDs might not match. To see the configuration from this disk, run
/etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t3d0s2 > dumpconfig_c2t3d0s2
If the other disks in the disk group were not imported on another host, Volume Manager resolves the conflicting values of the serial IDs by using the version of the configuration database from the disk with the greatest value for the updated ID (shown as update_tid in the output from /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/<device>).
In this example , looking through the dumpconfig, there are the following update_tid and ssbid values:
|
dumpconfig c2t3d0s2
|
dumpconfig c2t9d0s2
|
|
config:tid=0.1058
|
Config:tid=0.1059
|
|
dm d1
|
dm d1
|
|
update_tid=0.1038
|
Update_tid=0.1059
|
|
ssbid=0.0
|
ssbid=0.0
|
|
dm d2
|
dm d2
|
|
update_tid=0.1038
|
Update_tid=0.1038
|
|
ssbid=0.0
|
ssbid=0.0
|
|
dm d3
|
dm d3
|
|
update_tid=0.1053
|
Update_tid=0.1053
|
|
ssbid=0.0
|
ssbid=0.0
|
|
dm d4
|
dm d4
|
|
update_tid=0.1053
|
Update_tid=0.1059
|
|
ssbid=0.0
|
ssbid=0.1
|
Using the output from the dumpconfig for each disk determines which config output to use by running the command:
# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht
Before deciding on which option to use for import, ensure the disk group is currently in a valid deport state:
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk - (dgD280silo1) online
c2t2d0s2 auto:cdsdisk - (dgD280silo1) online
c2t3d0s2 auto:cdsdisk - (dgD280silo1) online
c2t9d0s2 auto:cdsdisk - (dgD280silo1) online
At this stage, your knowledge of how the serial split brain condition came about may be a little clearer and you should have chosen a configuration from one disk to be used to import the disk group. In this example, the following command imports the disk group using the configuration copy from d2:
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks. The actual and expected serial IDs for any disks in the disk group that are not imported at this time remain unchanged.
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen
pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
# vxrecover -g dgD280silo1 -sb
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking
# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
# df /orgvol
/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk d1 dgD280silo1 online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen
pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - DISABLED ACTIVE 1024000 SELECT - fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - DISABLED ACTIVE 1024000 SELECT - fsgen
pl orgvol-01 orgvol DISABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
# vxrecover -g dgD280silo1 -sb
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking
# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN
# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol
# df /orgvol
/orgvol (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks 127386 files
# vxdisk -o alldgs list
DEVICE TYPE DISK GROUP STATUS
c2t1d0s2 auto:cdsdisk d1 dgD280silo1 online
c2t2d0s2 auto:cdsdisk d2 dgD280silo1 online
c2t3d0s2 auto:cdsdisk d3 dgD280silo1 online
c2t9d0s2 auto:cdsdisk d4 dgD280silo1 online
# vxprint -htg dgD280silo1
dg dgD280silo1 default default 26000 1095738111.20.gopal
dm d1 c2t1d0s2 auto 2048 35838448 -
dm d2 c2t2d0s2 auto 2048 35838448 -
dm d3 c2t3d0s2 auto 2048 35838448 -
dm d4 c2t9d0s2 auto 2048 35838448 -
v SNAP-vol_db2silo1.1 - ENABLED ACTIVE 1024000 SELECT SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01 SNAP-vol_db2silo1.1-01 d3 0 512000 0/0 c2t3d0 ENA
sd d4-01 SNAP-vol_db2silo1.1-01 d4 0 512000 1/0 c2t9d0 ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v SNAP-vol_db2silo1.1_dcl - ENABLED ACTIVE 544 SELECT - gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02 SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0 c2t3d0 ENA
v orgvol - ENABLED ACTIVE 1024000 SELECT orgvol-01 fsgen
pl orgvol-01 orgvol ENABLED ACTIVE 1024000 STRIPE 2/128 RW
sd d1-01 orgvol-01 d1 0 512000 0/0 c2t1d0 ENA
sd d2-01 orgvol-01 d2 0 512000 1/0 c2t2d0 ENA
|
|
Related Articles
Legacy ID
269233
Article URL http://www.symantec.com/docs/TECH33020
Terms of use for this information are found in Legal Notices









Thank you.