Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Consideration when using Veritas Volume Replicator

Created: 23 Jul 2012 • Updated: 23 Jul 2012
Language Translations
Zahid.Haseeb's picture
+3 3 Votes
Login to vote

Environment

Operating system version = Linux Redhat 6.2

Storage Foundation version = 6.0 with RP1

LinuxBox1 with two ethernets

Linuxbox2 with two ethernets

One NIC(eth0) for users access and second NIC(eth1) is dedicated to replication.

Supposed example of scenerio:

Recently I have setup Veritas Volume Replicator under above mention environment and faces lot of problems. I would like to share almost all which I hope helpfull while setting up in that sort of scenerios.

Dedicate one NIC’s only for replication

For replication scenario we may have two NIC’s or recommend to take two NIC’s. One NIC we use for user request or can make a public NIC or eth0 and second NIC will be dedicated for replication, this NIC(eth1) only use for the data which is to be replicated on another host/secondary host.

TIP: Layer II network is good or Layer III is good for Replication ?

Layer II network is good than Layer III as no router is involve in Layer II(the network has no hopes in between means both sites are ping able with no gateway). See the below snap for a reference:

Configuration way for both NIC’s

eth0 sets with default gateway, if any user request which eth0 cannot fulfil or the request is for a different network, the request will be forwarded to the default gateway.

eth1 sets with a static route to communicate with its desire host, will be forwarded by the static route, ( we configure static route on eth1 as we cannot used two gateway’s on one host. This is the case when the replication is set on LayerIII means multiple routers involve). Static route can also be configured when you have LayerII setup for replication and you want to restrict a specific NIC(in our case eth1) for the replication packet to be route. See the below snap for a reference:

Recommendation which NIC use gateway and which NIC use static route

My suggestion is use gateway with eth0 and static route with eth1 as eth0 is receiving the users request which can have the request for many/different networks while eth1 only has to communicate with the secondary site host.

Static route should be process by specific NIC if multiple NIC’s present on single host

While running the route add command, if you want that this static route should use a particular NIC then mention the option -I ethx or dev ethx in the end of route add command. This will process your packet with that specific NIC/Ethernet. In our case the specific NIC may be eth1 for replication purpose. But make sure that the route should be persestent not temporary which may remove when you restart the network service or either restart the system.

Static Route: don’t get ping reply when static route only configured (in one direction) towards siteA to siteB

This thing should be in your consideration that a one way static route cannot give you ping/icmp reply even for one direction.

Please Note: In the below example we talk about only the static route which we configure on each Linux Box for primary to secondary site host IP and secondary to primary site host IP with the local default gateway(one site static route example on linux box to clearify: route add -host Secondary_Host_IP gw local_Host_Gateway dev eth1). We are not talking about the static route between routers of Site A and Site B.

There would be a question that why we define the static route. answer: On one NIC we assign the default gateway so this gateway is responsible for all the traffic which has to be route. But the second NIC which we reserved for replication dont have the default gateway so we configure here static route so the packet of replication atleast get out from our system and reach to the first router. After the router is responsible

example:

you have configured the static route from SiteA to SiteB but did not configured the static route from SiteB to SiteA. What will happen then ?

suppose you have configured one way static route from SiteA to SiteB but still you cannot get the reply when you ping from Site A to Site B. Why ?

the incoming ICMP packets will properly arrive on Site B , but outgoing response packets can’t go back from Site B to Site A as there is no route define from Site B to Site A or Site B dont know the path where the packets came from although the Site A knows the path as static route is configured.

Static Route from SiteA to SiteB

SiteA_LinuxBox=>=>=RouterSiteA=>=>=>=>=Cloud=>=>=>=>=RouterSiteB=>=>=SiteB_LinuxBox

Static Route from SiteB to SiteA

SiteA_LinuxBox<=x<=x=RouterSiteA<=x<=x=Cloud<=x<=x=RouterSiteB<=x<=x=SiteB_LinuxBox

Verify connectivity between two sites

Run the traceroute command with the destination IP Address. See as an example below.

# traceroute 203.170.71.113
traceroute to 203.170.71.113 (203.170.71.113), 30 hops max, 60 byte packets
1  192.168.168.168 (192.168.168.168)  1.554 ms  1.545 ms  1.549 ms
2  10.138.80.49 (10.138.80.49)  8.641 ms  11.457 ms  26.560 ms
3  gb-lan-72-129.kar.netsolir.com (203.170.72.129)  28.692 ms  32.843 ms  36.043 ms
4  gb-lan-72-113.kar.netsolir.com (203.170.72.113)  34.748 ms  42.676 ms  42.945 ms
5  * * *
6  * * *
7  * * *
8  * * *
9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
[root@LinuxBox]#

As you have seen the above example that the first IP Address after running the traceroute command , this is the IP Address where your packet reached first or the first router and also see the last IP Address, This IP Address should be your Machine IP which is present at secondary Site as we have seen the example above. If in case your packet got stuck somewhere in the middle, that IP may be a router or switch(the packet may come at secondary site but some where stuck in switch in your company network) which is not able to forward your IP ahead or towards your host/system.

""NOW SOMETHING DIRECTLY RELATED TO VERITAS VOLUME REPLICATOR""

  • ERROR V-5-52-803 and ERROR V-5-52-802

In this recent implementation I had a connectivity as I described in the above supposed scenerio. I had a static route configured via eth1 and I am able to copy the files from primary to secondary site, can also ping from primary site to DR site but when I ran the below command I got an error which is also mention below. The amazing thing which I was facing is that when I remove the static route from primary to secondary(offcourse this will lost the connectivity) and add the default gateway instead of static route and then when we run the same command, this can finish successfully. 

#vradmin -g Diskgroup addsec RVGname DataVol SRLVol

VxVM VVR vradmin ERROR V-5-52-803 Lost connection to host x.x.x.x_SecondaryHost; terminating command execution
VxVM VVR vradmin ERROR V-5-52-802 cannot start command execution on Secondary.

Definately the connectivity issue is not related with Veritas Volume Replicator and its the duty of Network Engineer to do this successfully. But in our case we did not have good come back from NETWORK site. So we tried to do something which we can do from our site and got an interesting article through which we can add two default gateway's on one system. See the article below for a reference :

http://www.linuxhorizon.ro/iproute2.html

instead to follow the above mentioned article we did the same as I described in the first snap and all things were went successful but as I said in this scenerio the Network team was not able to root out the problem which we were facing in the form of ERROR V-5-52-803 and ERROR V-5-52-802

  • As far as the connectivity between replication of both Sites we faced the below circumstances which I think very usual for new. I tried to discussed in details.

Some time this happens that first time we configure replication between primary and secondary Site Node at primary site(also can do replicate completely the DR Site Node as we can get good speed on LAN) and then send the secondary Node to DR site and at DR site sometimes we have a different subnet so we have to take the IP with a different subnet so when the IP will be changed we cant run the change ip command to change the replication IP,

so we run the command of change IP at primary site before sending the secondary Node to secondary/Dr Site. Below we discussed three ways (depend on the circumstances) to change the Replication IPs after configured the replication.

1.) In normal circumstances.

  • Before running below command do not change the primary and secondary site replication IP first. First run this below change ip command (at primary site) and send the Node at DR site when the secondary Site Node will get new IP(which we have given in the change ip command at primary Site) then resume the replication.

example = vradmin -g DG changeip RVGname newpri=10.0.0.1 newsec=10.0.0.2

Please Note: Now the rlink name will not change(will reflect old IP) when run the vxprint command but its just name and actually the IP's of replication behind the rlink is changed.

2.) Here we did one mistake:

(This procedure cost us replicate the data again but don't need to destroy the DG )

Before sending the Node at DR site we forgot to run the change ip command and the IPs are changed now as the Node is at DR site and there the subnet is change..

Now we will do the below procedure to fix this problem

Step #1 (Run the step#1 on both Sites Nodes)

#vxrlink -g DiskGroup det Rlink_name

#vxrlink -g DiskGroup dis Rlink_name

Now vxprint will show you the below result

# vxprint

rl rlk_10.0.0.2_DG-RVG - DETACHED - - UNASSOC -

Step #2

Now run the delsec command first then run the delpri command.

Now run the createpri and addsec command to establish the replication again.

TIP:

Some times you are not able to run the addsec command. While using the addsec command you get an error that rlink already created. For example you have rlinks which are DETACHED and UNASSOC and you dont want to add this rlink. So at this time you run the addsec command and at the end of addsec command you will mention the prlink=NEW_LINK_NAME_FOR_PRI srlink=NEW_LINK_NAME_FOR_SEC.

3.) Here we did two mistakes:

(This procedure cost us replicate the data again but need to destroy the DG at secondary site)

Mistake #1

Before sending the Node at DR site we forgot to run the change ip command and the IPs are changed now as the Node is at DR site and there the subnet is change..

Mistake#2

we accidentally ran the delpri command first and then we are not able to run the delsec command which cost us to remove the DG of secondary Node as we will not be able to delete the RVG at secondary site Node. So the removing of DG will also delete the secondary Site RVG but we have to make the DG and volumes again.

Steps to do:

Run the delpri command at primary Site Node (This remove the RVG).

Destroy the DG at secondary Site (This removes the RVG).

Create DG and volumes at secondary Site.

Now run the createpri and addsec command to establish the replication again.