Passthru Mode - Verify if SRL Disk is faulty or not ?
I have faced Replication in Passthru mode twice, Once it happened at one of our client and then I faced it at my Internal office.
Firstly I read in detail what is the root cause of replication to enter into Passthru mode. This is the summary of why this happens
The Primary RVG is in passthru mode because the Primary SRL is detached, missing, or unavailable. For more information on RVG PASSTHRU mode, see the Veritas Volume Replicator Administrator's Guide.
There is a usual reason of Passthru mode where SRL Disk functionality is not stable, I/O Errors are usually found in the Logs but this is not necessary if the SRL Disk is the actual culprit. Following is the method which I was advised from a Technical Engineer to verify if Disk is faulted or not.
Lets start reviewing the case and the verify if the Disk is faulted or not
OS Version: Red Hat Enterprise Linux Server release 5.3 (Tikanga)
SF Version: 5.0
Replication Status: Passthru Mode
# vradmin -g sourcesafe repstatus home-rvgReplicated Data Set: home-rvgPrimary:Host name: 192.168.1.xRVG name: home-rvgDG name: sourcesafeRVG state: enabled for I/O (passthru)Data volumes: 1VSets: 0SRL name: home-srlSRL size: 10.00 GTotal secondaries: 1Secondary:Host name: 192.168.1.xxRVG name: home-rvgDG name: sourcesafeData status: consistent, up-to-dateReplication status: not replicating (primary needs recovery)Current mode: asynchronousLogging to: N/ATimestamp Information: N/A==========================================================================
RLink in RECOVER state in vxprint results
#vxprintDisk group: sourcesafeTY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0dg sourcesafe sourcesafe - - - - - -dm sourcesafe01 sdb - 625071264 - - - -dm sourcesafe02 sdc - 239945296 - - - -rv home-rvg - ENABLED - - ACTIVE - -rl rlk_192.168.1.222_home-rvg home-rvg RECOVER - - STALE - -v home home-rvg ENABLED 624951296 - ACTIVE - -pl home-01 home ENABLED 624951296 - ACTIVE - -sd sourcesafe01-01 home-01 ENABLED 624951296 0 - - -pl home-02 home ENABLED LOGONLY - ACTIVE - -sd sourcesafe01-02 home-02 ENABLED 512 LOG - - -pl home-04 home ENABLED LOGONLY - ACTIVE - -sd sourcesafe02-01 home-04 ENABLED 512 LOG - - -v home-srl home-rvg ENABLED 209715200 SRL ACTIVE - -pl home-srl-01 home-srl ENABLED 209715200 - ACTIVE - -sd sourcesafe02-02 home-srl-01 ENABLED 209715200 0 - - -==========================================================================
If the status of your disk is FAILING, then first we will have to remove the tag of FAILING from the relevant disk in order to proceed further, check below how it will occur and how to remove the tag.
Disk group: sourcesafeTY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0dg sourcesafe sourcesafe - - - - - -dm sourcesafe01 sdb - 625071264 - - - -dm sourcesafe02 sdc - 156230304 - FAILING - -rv home-rvg - ENABLED - - ACTIVE - -rl rlk_192.168.1.x_home-rvg home-rvg RECOVER - - ACTIVE - -v home home-rvg ENABLED 624951296 - ACTIVE - -pl home-01 home ENABLED 624951296 - ACTIVE - -sd sourcesafe01-01 home-01 ENABLED 624951296 0 - - -pl home-02 home ENABLED LOGONLY - ACTIVE - -sd sourcesafe01-02 home-02 ENABLED 512 LOG - - -pl home-03 home ENABLED LOGONLY - ACTIVE - -sd sourcesafe02-01 home-03 ENABLED 512 LOG - - -v home-srl home-rvg ENABLED 20971520 SRL ACTIVE - -pl home-srl-01 home-srl ENABLED 20971520 - ACTIVE - -sd sourcesafe02-02 home-srl-01 ENABLED 20971520 0 - - -==============================================================================Remove tag of Failing from Disk
#vxedit -g sourcesafe set failing=off sourcesafe02==============================================================================
Now as we know SRL is virtually detached from Replication Volume Group (RVG) but vxprint results does not indicate such incident and holds some of its elements when it come to passthru mode, therefore we will first manually Dis-associate the SRL disk from the RVG
Dis-association of SRL volume from Diskgroup
My Diskgroup: sourcesafe
My SRL Volume: home-srl
#vxvol -g sourcesafe dis home-srl
Dis-association of SRL is required in order to perform the activity to verify the SRL Volume being faulty or not, as in the next activity we will be performing Read & Write with "dd" command, which will give us result of I/O on the volume, therefore volume should be out of the RVG.
Ways to verify disk fault, provided by Technical Engineer - In my case the disk was faulty.
First we will check Read on the srl disk
# dd if=/dev/vx/rdsk/sourcesafe/home-srl of=/dev/null bs=65536
dd: reading `/dev/vx/rdsk/sourcesafe/home-srl': Input/output error
99615+0 records in
99615+0 records out
6528368640 bytes (6.5 GB) copied, 175.104 seconds, 37.3 MB/s
The above results show I have Input/output error in Read
Now we will check Write on the srl disk
# dd if=/dev/zero of=/dev/vx/rdsk/sourcesafe/home-srl bs=65536
dd: writing `/dev/vx/rdsk/sourcesafe/home-srl': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00049 seconds, 0.0 kB/s
The above results show I have Input/output error in Write too
As my disk was verified being faulty, therefore replacement of disk is required which should be precedded with vxdiskadm or vxdiskadd command with the options of replace, remove etc.
After replacing the disk and configuring properly back, Associate it back to RVG.
Associating the log volume back in RVG
My RVG: home-rvg
#vxvol -g sourcesafe aslog home-rvg home-srl