Video Screencast Help

How to handle dual primary data corrupt

Created: 06 Mar 2011 • Updated: 17 Mar 2011 | 2 comments
This issue has been solved. See solution.

We use vrr + vgcm to build a dr solution for oracle db.  There are two sites , each site including one server with a storage(will call server A, B).  When A is crash, B will take over the service and bring the rvg to primary. But if the data in A's volume is not fully synced   to B (for example, low network bandwith);  since B is taking over the db to run the job, data in B's volume will be changed . And then data corrupt occurs. 

Does anyone have a solution to hanle this data corruption?

Comments 2 CommentsJump to latest comment

Wally_Heim's picture

Hi Mia,

 

This is not really a data corruption issue.  When VVR did a takeover on server B, the data that server B had access to should be in sync with server A at some point in time but not completely up to date.  The data that server B has access to is missing data.  This is the problem with doing a take over when your data is not fully synced.

Typically in this situation you have 3 options.

1. Continue with the data that server B is now using and live with the lose of the data that was not synced from server A.

2. Get server A back online and discard the post takeover changed data that was done to server B.

3. Manually merge the differences between server A and Server B to the server that you want to continue using.

 

To avoid the secondary being out of sync you can do one of the following.

 

1. Implement Synchronous replication so that data between the primary and secondary are never out of sync.

2. Increase bandwidth to a level that is higher than the average I/O load so that there is minimal chance that replication will be behind.

3. Implement a VVR Bunker site that keeps a copy of the Replcator Log at a third site that can be used to fully sync the secondary in case the primary is no longer available.  In other words, you would finish syncing the secondary site from the bunder site when the primary is no longer available. 

 

I hope this helps answer your question.

 

Thanks,

Wally

SOLUTION
Riaan.Badenhorst's picture

Hi Mia,

 

You can use the In-Band Control (IBC) Messaging feature with the FastResync (FMR) feature of Veritas Volume Manager (VxVM) and its integration with VVR to take application-consistent snapshots at the replicated volume group (RVG) level. This lets you perform off-host processing on the Secondary host.

 

Basically this allows you to place your Oracle DB into backup mode in the primary site, insert a marker into the replication data stream, and have that marker trigger a snapshot creation/refresh in DR. This will give you a consistent copy you can use to mount should you need to failover to DR. In the UNIX environment this can be either a full instant or space optimized snapshot. You would need to create the script to place Oracle into backup mode yourself and then add the commands to send the IBC message to the secondary.

 

The frequency that you run this script will depend on the frequency you wish to place your DB into backup mode, the RPO, and possibly the bandwidth available to you (how many minutes/hours your secondary is behind). At least this way you know that you've got a consistent copy in DR, even if its a few minutes/hours older than the primary.

 

https://sort.symantec.com/public/documents/sfha/5....

Regards,

Riaan Badenhorst

You need an OpenVision to see the truth about Backups. Restores are a plus. But that's just Semantics ;)

ITs easy :)