Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator Option

Article:TECH52545  |  Created: 2007-01-06  |  Updated: 2009-01-19  |  Article URL http://www.symantec.com/docs/TECH52545
Article Type
Technical Solution

Product(s)

Environment

Issue



Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator Option

Solution




Introduction

Note: This technote assumes that replication log protection is set to DCM or AutoDCM. These are the most common configurations.

If the replicator log (SRL) overflows, the RDS (Replicated Data Set) will begin to track writes using a DCM (Data Change Map). When the DCM is in use, replication stops and all new writes are simply tracked in the DCM on the primary.

To determine if a DCM is being used and how to restart replication, follow the steps that are found in the Solution section of this document (below).


Background

To understand why replication stops when the DCM is in use, the differences between the SRL and the DCM must be clarified.

The SRL is a large transaction log that tracks each individual write. Because of this, the order of the writes is maintained (Write-order fidelity). This makes it possible for replication to continue, even if the secondary is not up-to-date. Without write-order fidelity, the blocks on the secondary would become inconsistent.

A DCM is too small to track individual writes. Instead, it divides the volume into a number of regions. When a write occurs within a region, the entire region is marked as "dirty." Even if the region is very large, and the amount of data that is written is very small, the entire region is marked as dirty. The DCM does not maintain write-order fidelity. Because of this, the entire dirty regions must finish replicating to the secondary site before the secondary volume is considered "consistent." Until the all the dirty regions have been replicated, the secondary volume is considered "inconsistent" and is not usable.

In the event of a VVR (Veritas Volume Replicator) migration or take-over, the secondary volumes are promoted to primary volumes. If these operations are performed at a time when the secondary volumes are inconsistent, the volumes will be corrupt and will not be usable. To avoid this, replication automatically stops when an SRL overflows and the DCM is in use. The result is that the secondary volume is not up-to-date, but the data is "consistent" and usable.

Replication can be restarted manually by using the Resynchronize Secondaries command. Until all "dirty" regions have been replicated to the secondary, the volume will be inconsistent. During this time, a VVR migration or take-over will not be possible.

Note: If a "dirty" region finishes replicating to the secondary, but a new transaction is subsequently written to that same region, the entire region will again be marked as "dirty." This may result in situations where a replicated data set is unable to get out of DCM mode. If this occurs, two options may be used to resynchronize the secondary with the primary:

1. Temporarily stop all writes to the primary, allowing the DCM time to "drain."
2. Perform a block-level backup and restore of the replicated volume from the primary to the secondary sites. Further information on this can be found in the following technote:  http://support.veritas.com/docs/289669

Solution

If an RVG is in DCM mode, replication will remain stopped until the following steps are performed:

1. Expand Replication Network.
2. Right-click on the primary RVG.
3. Select Resynchronize Secondaries.

If needed, use vxprint to determine if the RVGs are in DCM mode. This can be done by following the steps below.

1. Run the following command on both sites:

vxprint -VPl

Note: That is an upper-case V, an upper-case P and a lower-case L.

This will return results that are similar to the following (Figure 1):


Figure 1:  Sample Vxprint Output

Diskgroup = HSJIT01_DG

Rvg : HSJIT01_RVG
state : state=ACTIVE kernel=ENABLED
assoc : datavols=F:
 srl=\Device\HarddiskDmVolumes\CEPFILWTS01_DG\RepLog
 rlinks=rlk_cezcfs01_24432
att : rlinks=rlk_cezcfs01_24432
checkpoint :
flags : primary enabled attached dcm_logging clustered
Rlink : rlk_cezcfs01_24432
info : timeout=500 packet_size=8400
 latency_high_mark=10000 latency_low_mark=9950
 bandwidth_limit=none
state : state=ACTIVE
 synchronous=off latencyprot=off srlprot=autodcm
assoc : rvg=HSJIT01_RVG
 remote_host=172.65.4.42
 remote_dg=HSJIT01_DG
 remote_rlink=rlk_hsjit01_14382
 local_host=172.65.4.48
protocol : UDP/IP
flags : write attached consistent connected dcm_logging


2. Locate the flags attribute.

Note: There will be two instances of flags.

3. If the RVG or Rlink is in DCM mode, there will be a dcm_logging value next to the flags attributes.






Legacy ID



289830


Article URL http://www.symantec.com/docs/TECH52545


Terms of use for this information are found in Legal Notices