Video Screencast Help

Passthru Mode - Verify if SRL Disk is faulty or not ?

Created: 08 Jan 2013 • Updated: 29 Jan 2013
Language Translations
Faisal Saleem's picture
+2 2 Votes
Login to vote

I have faced Replication in Passthru mode twice, Once it happened at one of our client and then I faced it at my Internal office.

Firstly I read in detail what is the root cause of replication to enter into Passthru mode. This is the summary of why this happens
 

The Primary RVG is in passthru mode because the Primary SRL is detached, missing, or unavailable. For more information on RVG PASSTHRU mode, see the Veritas Volume Replicator Administrator's Guide.  

 

There is a usual reason of Passthru mode where SRL Disk functionality is not stable, I/O Errors are usually found in the Logs but this is not necessary if the SRL Disk is the actual culprit. Following is the method which I was advised from a Technical Engineer to verify if Disk is faulted or not.

Lets start reviewing the case and the verify if the Disk is faulted or not

 

REPLICATION ENVIRONMENT:

OS Version:        Red Hat Enterprise Linux Server release 5.3 (Tikanga)

SF Version:         5.0

==========================================================================

Replication Status: Passthru Mode

 
# vradmin -g sourcesafe repstatus home-rvg
 
Replicated Data Set: home-rvg
Primary:
  Host name:                   192.168.1.x
  RVG name:                   home-rvg
  DG name:                     sourcesafe
  RVG state:                    enabled for I/O (passthru)
  Data volumes:               1
  VSets:                          0
  SRL name:                    home-srl
  SRL size:                      10.00 G
  Total secondaries:          1
 
Secondary:
  Host name:                    192.168.1.xx
  RVG name:                    home-rvg
  DG name:                      sourcesafe
  Data status:                   consistent, up-to-date
  Replication status:        not replicating (primary needs recovery)
  Current mode:                asynchronous
  Logging to:                     N/A
  Timestamp Information:      N/A
 
 
==========================================================================

RLink in RECOVER state in vxprint results

#vxprint
 
Disk group: sourcesafe
 
TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
dg sourcesafe   sourcesafe   -        -        -        -        -       -
 
dm sourcesafe01 sdb          -        625071264 -       -        -       -
dm sourcesafe02 sdc          -        239945296 -       -        -       -
 
rv home-rvg     -            ENABLED  -        -        ACTIVE   -       -
rl rlk_192.168.1.222_home-rvg home-rvg RECOVER - -      STALE    -       -
v  home         home-rvg     ENABLED  624951296 -       ACTIVE   -       -
pl home-01      home         ENABLED  624951296 -       ACTIVE   -       -
sd sourcesafe01-01 home-01   ENABLED  624951296 0       -        -       -
pl home-02      home         ENABLED  LOGONLY  -        ACTIVE   -       -
sd sourcesafe01-02 home-02   ENABLED  512      LOG      -        -       -
pl home-04      home         ENABLED  LOGONLY  -        ACTIVE   -       -
sd sourcesafe02-01 home-04   ENABLED  512      LOG      -        -       -
v  home-srl     home-rvg     ENABLED  209715200 SRL     ACTIVE   -       -
pl home-srl-01  home-srl     ENABLED  209715200 -       ACTIVE   -       -
sd sourcesafe02-02 home-srl-01 ENABLED 209715200 0      -        -       -
 
 
==========================================================================

If the status of your disk is FAILING, then first we will have to remove the tag of FAILING from the relevant disk in order to proceed further, check below how it will occur and how to remove the tag.
 

#vxprint

 

Disk group: sourcesafe
 
TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0  PUTIL0
dg sourcesafe   sourcesafe   -        -        -        -        -       -
 
dm sourcesafe01 sdb          -        625071264 -       -        -       -
dm sourcesafe02 sdc          -        156230304 -       FAILING  -       -
 
rv home-rvg     -            ENABLED  -        -        ACTIVE   -       -
rl rlk_192.168.1.x_home-rvg home-rvg RECOVER - -      ACTIVE   -       -
v  home         home-rvg     ENABLED  624951296 -       ACTIVE   -       -
pl home-01      home         ENABLED  624951296 -       ACTIVE   -       -
sd sourcesafe01-01 home-01   ENABLED  624951296 0       -        -       -
pl home-02      home         ENABLED  LOGONLY  -        ACTIVE   -       -
sd sourcesafe01-02 home-02   ENABLED  512      LOG      -        -       -
pl home-03      home         ENABLED  LOGONLY  -        ACTIVE   -       -
sd sourcesafe02-01 home-03   ENABLED  512      LOG      -        -       -
v  home-srl     home-rvg     ENABLED  20971520 SRL      ACTIVE   -       -
pl home-srl-01  home-srl     ENABLED  20971520 -        ACTIVE   -       -
sd sourcesafe02-02 home-srl-01 ENABLED 20971520 0       -        -       -
 
==============================================================================
Remove tag of Failing from Disk
 

#vxedit -g sourcesafe set failing=off sourcesafe02

==============================================================================

 

Now as we know SRL is virtually detached from Replication Volume Group (RVG) but vxprint results does not indicate such incident and holds some of its elements when it come to passthru mode, therefore we will first manually Dis-associate the SRL disk from the RVG

 

Dis-association of SRL volume from Diskgroup

My Diskgroup: sourcesafe

My SRL Volume: home-srl

#vxvol -g sourcesafe dis home-srl

==============================================================================

 

Dis-association of SRL is required in order to perform the activity to verify the SRL Volume being faulty or not, as in the next activity we will be performing Read & Write with "dd" command, which will give us result of I/O on the volume, therefore volume should be out of the RVG.

Ways to verify disk fault, provided by Technical Engineer - In my case the disk was faulty.

First we will check Read on the srl disk

# dd if=/dev/vx/rdsk/sourcesafe/home-srl of=/dev/null bs=65536
dd: reading `/dev/vx/rdsk/sourcesafe/home-srl': Input/output error
99615+0 records in
99615+0 records out
6528368640 bytes (6.5 GB) copied, 175.104 seconds, 37.3 MB/s

==============================================================================

The above results show I have Input/output error in Read

Now we will check Write on the srl disk

# dd if=/dev/zero of=/dev/vx/rdsk/sourcesafe/home-srl bs=65536
dd: writing `/dev/vx/rdsk/sourcesafe/home-srl': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00049 seconds, 0.0 kB/s

==============================================================================

The above results show I have Input/output error in Write too

As my disk was verified being faulty, therefore replacement of disk is required which should be precedded with vxdiskadm or vxdiskadd command with the options of replace, remove etc.

After replacing the disk and configuring properly back, Associate it back to RVG.

 

Associating the log volume back in RVG

My RVG: home-rvg

#vxvol -g sourcesafe aslog home-rvg home-srl

==============================================================================