HOWTO: Recover from a serial split brain

Article:TECH33020  |  Created: 2009-01-28  |  Updated: 2010-11-28  |  Article URL http://www.symantec.com/docs/TECH33020
Article Type
Technical Solution



Problem



HOWTO: Recover from a serial split brain


Error



VxVM vxdg ERROR V-5-1-10127 associating disk-media with :
Serial Split Brain detected. Run vxsplitlines


Solution



Background:
 
The Serial Split Brain condition arises because VERITAS Volume Manager (tm) increments the serial ID in the disk media record of each imported disk in all the disk group configurations on those disks. A new serial (SSB) ID has been included as part of the new disk group version=110 in Volume Manager 4 to assist with recovery of the disk group from this condition. The value that is stored in the configuration database represents the serial ID that the disk group expects a disk to have. The serial ID that is stored in a disk's private region is considered to be its actual value.
 
If some disks went missing from the disk group (due to physical disconnection or power failure) and those disks were imported by another host, the serial IDs for the disks in their copies of the configuration database, and also in each disk's private region, are updated separately on that host. When the disks are subsequently reimported into the original shared disk group, the actual serial IDs on the disks do not agree with the expected values from the configuration copies on other disks in the disk group.
 
The disk group cannot be reimported because the databases do not agree on the actual and expected serial IDs. You must choose which configuration database to use.  This is a true serial split brain condition, which Volume Manager cannot correct automatically. In this case, the disk group import fails, and the vxdg utility outputs error messages similar to the following before exiting:
 
VxVM vxconfigd NOTICE V-5-0-33 Split Brain. da id is 0.1, while dm id is 0.0 for DM <dg name> VxVM vxdg ERROR V-5-1-587 Disk group <dg name>: import failed: Serial Split Brain detected. Run vxsplitlines
 
The import does not succeed even if you specify the -f flag to vxdg.
 
Although it is usually possible to resolve this conflict by choosing the version of the configuration database with the highest valued configuration ID (shown as config_tid in the output from the vxprivutil dumpconfig  <device>), this may not be the correct thing to do in all circumstances.
 
To resolve conflicting configuration information, you must decide which disk contains the correct version of the disk group configuration database. To assist you in doing this, you can run the vxsplitlines command to show the actual serial ID on each disk in the disk group and the serial ID that was expected from the configuration database. For each disk, the command also shows the vxdg command that you must run to select the configuration database copy on that disk as being the definitive copy to use for importing the disk group.
 
The following example shows the result of JBOD losing access to one of the four disks in the disk group:
 

# vxdisk -o alldgs list

DEVICE       TYPE            DISK         GROUP        STATUS
c2t1d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t2d0s2     auto:cdsdisk    d2           dgD280silo1  online
c2t3d0s2     auto:cdsdisk    d3           dgD280silo1  online
c2t9d0s2     auto:cdsdisk    d4           dgD280silo1  online
-            -         d1           dgD280silo1  failed was:c2t1d0s2

# vxreattach -c c2t1d0s2
dgD280silo1 d1

# vxreattach -br c2t1d0s2
VxVM vxdg ERROR V-5-1-10127 associating disk-media d1 with c2t1d0s2:
Serial Split Brain detected. Run vxsplitlines

# vxsplitlines -g dgD280silo1

VxVM vxsplitlines NOTICE V-5-2-2708 There are 1 pools.
The Following are the disks in each pool. Each disk in the same pool
has config copies that are similar.
VxVM vxsplitlines INFO V-5-2-2707 Pool 0.
c2t1d0s2 d1

To see the configuration copy from this disk issue /etc/vx/diag.d/vxprivutil dumpconfig /dev/vx/dmp/c2t1d0s2
To import the diskgroup with config copy from this disk use the following command;

# /usr/sbin/vxdg -o selectcp=1092974296.21.gopal import dgD280silo1

The following are the disks whose serial split brain (SSB) IDs  don't match in this configuration copy:
d2

At this stage,  you need to gain confidence prior to running the recommended command by generating the following outputs :
 
In this example, the disk group split so that one disk  (d1) appears to be on one side of the split. You can specify the -c option to vxsplitlines to print detailed information about each of the disk IDs from the configuration copy on a disk specified by its disk access name:
 

# vxsplitlines -g dgD280silo1 -c c2t3d0s2

 VxVM vxsplitlines INFO V-5-2-2701 DANAME(DMNAME)      || Actual SSB   || Expected SSB
 VxVM vxsplitlines INFO V-5-2-2700 c2t1d0s2( d1 )      || 0.0          || 0.0 ssb ids match
 VxVM vxsplitlines INFO V-5-2-2700 c2t2d0s2( d2 )      || 0.1          || 0.0 ssb ids don't match
 VxVM vxsplitlines INFO V-5-2-2700 c2t3d0s2( d3 )      || 0.1          || 0.0 ssb ids don't match
 VxVM vxsplitlines INFO V-5-2-2700 c2t9d0s2( d4 )      || 0.1          || 0.0 ssb ids don't match
 VxVM vxsplitlines INFO V-5-2-2706

This output can be verified by using vxdisk list  on each disk. A summary is shown below:

 
# vxdisk list c2t1d0s2
 
# vxdisk list c2t3d0s2
 
Device:    c2t1d0s2
 
Device:    c2t3d0s2
 
disk:      name= id=1092974296.21.gopal
 
disk:      name=d3 id=1092974311.23.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
ssb:       actual_seqno=0.0
 
ssb:       actual_seqno=0.1
 

 

 
# vxdisk list c2t2d0s2
 
#  vxdisk list c2t9d0s2
 
Device:    c2t2d0s2
 
Device:    c2t9d0s2
 
disk:      name=d2 id=1092974302.22.gopal
 
disk:      name=d4 id=1092974318.24.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
group:     name=dgD280silo1 id=1095738111.20.gopal
 
ssb:       actual_seqno=0.1
 
ssb:       actual_seqno=0.1
 


Note that though some disks SSB IDs might match that does not necessarily mean that those disks' config copies have all the changes. From some other configuration copies, those disks' SSB IDs might not match. To see the configuration from this disk, run
/etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/c2t3d0s2 > dumpconfig_c2t3d0s2

If the other disks in the disk group were not imported on another host, Volume Manager resolves the conflicting values of the serial IDs by using the version of the configuration database from the disk with the greatest value for the updated ID (shown as update_tid in the output from  /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/<device>).

In this example , looking through the dumpconfig, there are the following  update_tid and ssbid values:

 
dumpconfig c2t3d0s2
 
dumpconfig c2t9d0s2
 
config:tid=0.1058
 
Config:tid=0.1059
 
dm   d1
 
dm   d1
 
update_tid=0.1038
 
Update_tid=0.1059
 
ssbid=0.0
 
ssbid=0.0
 
dm   d2
 
dm   d2
 
update_tid=0.1038
 
Update_tid=0.1038
 
ssbid=0.0
 
ssbid=0.0
 
dm   d3
 
dm   d3
 
update_tid=0.1053
 
Update_tid=0.1053
 
ssbid=0.0
 
ssbid=0.0
 
dm   d4
 
dm   d4
 
update_tid=0.1053
 
Update_tid=0.1059
 
ssbid=0.0
 
ssbid=0.1
 

Using the output from the dumpconfig for each disk determines which config output to use by running the command:

# cat dumpconfig_c2t3d0s2 | vxprint -D - -ht  

Before deciding  on  which option to use for import, ensure the disk group is currently  in a valid deport state:

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
c2t1d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t2d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t3d0s2     auto:cdsdisk    -           (dgD280silo1) online
c2t9d0s2     auto:cdsdisk    -           (dgD280silo1) online

 
At this stage,  your knowledge of how the serial split brain condition came about may be a little clearer and you should have chosen a configuration from one disk to be used to import the disk group. In this example, the following command imports the disk group using the configuration copy from d2:
 
# /usr/sbin/vxdg -o selectcp=1092974302.22.gopal import dgD280silo1
Once the disk group has been imported, Volume Manager resets the serial IDs to 0 for the imported disks. The actual and expected serial IDs for any disks in the disk group that are not imported at this time remain unchanged.
 
# vxprint -htg dgD280silo1
 
dg dgD280silo1  default      default  26000    1095738111.20.gopal
 
dm d1           c2t1d0s2     auto     2048     35838448 -
dm d2           c2t2d0s2     auto     2048     35838448 -
dm d3           c2t3d0s2     auto     2048     35838448 -
dm d4           c2t9d0s2     auto     2048     35838448 -

v  SNAP-vol_db2silo1.1 -     DISABLED ACTIVE   1024000  SELECT    -        fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 DISABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01        SNAP-vol_db2silo1.1-01 d3 0    512000   0/0       c2t3d0   ENA
sd d4-01        SNAP-vol_db2silo1.1-01 d4 0    512000   1/0       c2t9d0   ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v  SNAP-vol_db2silo1.1_dcl - DISABLED ACTIVE   544      SELECT    -        gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl DISABLED ACTIVE 544 CONCAT - RW
sd d3-02        SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0        c2t3d0   ENA

v  orgvol       -            DISABLED ACTIVE   1024000  SELECT    -        fsgen
pl orgvol-01    orgvol       DISABLED ACTIVE   1024000  STRIPE    2/128    RW
sd d1-01        orgvol-01    d1       0        512000   0/0       c2t1d0   ENA
sd d2-01        orgvol-01    d2       0        512000   1/0       c2t2d0   ENA

# vxrecover -g dgD280silo1 -sb

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol

UX:vxfs mount: ERROR: V-3-21268: /dev/vx/dsk/dgD280silo1/orgvol is corrupted. needs checking

# fsck -F vxfs /dev/vx/rdsk/dgD280silo1/orgvol
log replay in progress
replay complete - marking super-block as CLEAN

# mount -F vxfs /dev/vx/dsk/dgD280silo1/orgvol /orgvol


# df /orgvol

/orgvol            (/dev/vx/dsk/dgD280silo1/orgvol): 1019102 blocks   127386 files

# vxdisk -o alldgs list

DEVICE       TYPE            DISK         GROUP        STATUS
c2t1d0s2     auto:cdsdisk    d1           dgD280silo1  online
c2t2d0s2     auto:cdsdisk    d2           dgD280silo1  online
c2t3d0s2     auto:cdsdisk    d3           dgD280silo1  online
c2t9d0s2     auto:cdsdisk    d4           dgD280silo1  online

# vxprint -htg dgD280silo1

dg dgD280silo1  default      default  26000    1095738111.20.gopal

dm d1           c2t1d0s2     auto     2048     35838448 -
dm d2           c2t2d0s2     auto     2048     35838448 -
dm d3           c2t3d0s2     auto     2048     35838448 -
dm d4           c2t9d0s2     auto     2048     35838448 -

v  SNAP-vol_db2silo1.1 -     ENABLED  ACTIVE   1024000  SELECT    SNAP-vol_db2silo1.1-01 fsgen
pl SNAP-vol_db2silo1.1-01 SNAP-vol_db2silo1.1 ENABLED ACTIVE 1024000 STRIPE 2/1024 RW
sd d3-01        SNAP-vol_db2silo1.1-01 d3 0    512000   0/0       c2t3d0   ENA
sd d4-01        SNAP-vol_db2silo1.1-01 d4 0    512000   1/0       c2t9d0   ENA
dc SNAP-vol_db2silo1.1_dco SNAP-vol_db2silo1.1 SNAP-vol_db2silo1.1_dcl
v  SNAP-vol_db2silo1.1_dcl - ENABLED  ACTIVE   544      SELECT    -        gen
pl SNAP-vol_db2silo1.1_dcl-01 SNAP-vol_db2silo1.1_dcl ENABLED ACTIVE 544 CONCAT - RW
sd d3-02        SNAP-vol_db2silo1.1_dcl-01 d3 512000 544 0        c2t3d0   ENA

v  orgvol       -            ENABLED  ACTIVE   1024000  SELECT    orgvol-01 fsgen
pl orgvol-01    orgvol       ENABLED  ACTIVE   1024000  STRIPE    2/128    RW
sd d1-01        orgvol-01    d1       0        512000   0/0       c2t1d0   ENA
sd d2-01        orgvol-01    d2       0        512000   1/0       c2t2d0   ENA



 

 




Legacy ID



269233


Article URL http://www.symantec.com/docs/TECH33020


Terms of use for this information are found in Legal Notices