The Replicated Volume Groups (RVG) at both production and DR sites are acting as the Primary
|Article:TECH199955|||||Created: 2012-11-20|||||Updated: 2013-01-14|||||Article URL http://www.symantec.com/docs/TECH199955|
After failure and recovery of an Replicated Volume Group (RVG) at the production site (the original primary) will consider itself as a primary still if the replication links (rlinks) between the sites are not able to connect.
The original secondary at the DR site will takeover the primary role on failure of the original primary and assume the primary role. The flags for both sites will include:
and the rlink flags will include
NOTE: If the rlinks could connect, then regular VVR operation is that the original primary when coming online will detect that a takeover has occurred when it tries to communicate to the original secondary which now shows it has the primary role. It will then show acting_secondary.
In this case where the rlinks cannot connect, and the case where the rlink can connect, the original secondary (as the takeover is performed on the secondary which then gets converted to "new" primary) will have the additional flags:
rlink: dcm_logging failback_logging (if enabled)
Two scenarios have been observed to happen:
1. The administrator elects to continue to run the application on the production site after a brief outage or a second issue at the DR site that stops client access
2. The application comes online and clients connect however VVR rlinks need to connect for VVR to resume
- Veritas Storage Foundation HA for Windows 5.1 SP2
- Microsoft Windows Server 2008 R2 Enterprise 64bit
- Global Cluster option (GCO) may be configured but is not specific to this issue
Primary RVG and Secondary rlinks are not able to connect
The following two common scenarios and resolution on the Replicated Volume Groups (RVG) at both Primary and Secondary sites are acting as the Primary.
For Scenario 1:
Here the application continues on original primary, and VVR flags on the original secondary show that it is "dcm_logging" and possibly "failback_logging" then it is important to note that on rlink connection VVR will perform its regular recovery operations and change the original primary to secondary.
This may not be what is desired and so the VVR configuration must be removed manually and recreated. In this case a full resync is required regardless, so little time is lost due to resync operation. While the rlinks are disconnected:
- On the producion site, delete RDS
- On the DR site, delete RDS
- If VVR is clustered and part of the GCO leave the replication IP address online, then on a node from each cluster execute:
hastop -all -force
- Ensure the IP addresses are available on both production and DR sites and run the wizard to create the RDS from where the production replication IP address is online. Ensure that the names of the VVR RVG and RDS is reused so the cluster configuration does not need to be changed.
- If rlinks are unable to connect, or the wizard cannot connect to the other cluster then the network will have to checked
- Once RDS is configured, then on a node from each cluster:
Note: delete RDS is done on both sides, as the rlinks are not connected this will not be a single operation as remote commands to delete the remote RVG cannot be executed with rlinks disconnected
For Scenario 2:
Here the application is to remain online on the DR site and the original primary needs to be recovered and rlinks connected in order to recover via VVR operation. Once VVR reconnects the rlinks it will take the remaining writes from the original primary if any and transfer them to the dcm map on the new primary and change the original primary to secondary. Once this happens the administrator is required to "resynchronise secondaries" (right click on the secondary from the replication network or the VEA gui).
If VVR rlinks are not connecting then
- confirm the replication IP addresses are available
- confirm connectivity between the nodes
- review VVR logs to see if it can be determined why the rlinks won't connect
- VVR transactions waiting, or states can be cleared by possibly deport / import the disk groups or rebooting both production and DR nodes.
Symantec Technical Support can assist in identifying either scenario and taking the appropriate action.
Article URL http://www.symantec.com/docs/TECH199955