Video Screencast Help

Why am I having connection problems?

Created: 07 Nov 2012 • Updated: 08 Nov 2012 | 10 comments
This issue has been solved. See solution.

I am trying to create a secondary and I am having all sorts of connection problems. I have 2 routers in between my servers set to a 512k connection. I have a continues ping going in a Command Prompt and am getting back at 2ms with no time outs. But when I try to connect from server one to server two I get an error v-39-53247-7. After a bunch of attempts I can sometimes get connected and am able to configure the RDS. But again once the RDS for the secondary is setup and I click start the status changes from connected to Disconnect ever few seconds and replication never happens. Where do I need to begin my troubleshooting?

Addition note: This is server2008r2 running SFWHA 5.1 SP2 CP13

Discussion Filed Under:

Comments 10 CommentsJump to latest comment

mikebounds's picture

You could try pinging with a packet size of 8192 to check networt can handle large packets - so for example in Linux run the following on primary:

ping -s 8192 sec_host

 

If there are problems with this, you can change the packet size used by VVR if you are using UDP - see "Choosing the packet size" extract from VVR Planning and Tuning Guide
 
Choosing the packet size
If you have selected the UDP transport protocol for replication, the UDP packet
size used by VVR to communicate between hosts could be an important factor in
the replication performance. By default, VVR uses a UDP packet size of 8400 bytes.
In certain network environments, such as those that do not support fragmented
IP packets, it may be necessary to decrease the packet size.
If the network you are using loses many packets, the effective bandwidth available
for replication is reduced. You can tell that this is happening if you run vxrlink
stats on the RLINK, and see many timeout errors.
In this case, network performance may be improved by reducing the packet size.
If the network is losing many packets, it may simply be that each time a large
packet is lost, a large retranmission has to take place. In this case, try reducing
the packet size until the problem is ameliorated.
 

You could also try using a different protocol so use "vxprint -Pl" to see if TCP or UDP is being used and then try the other using:

 

vxedit -g diskgroup set protocol=UDP|TCP rlink_name
 
Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

mhab11's picture

I removed the routers and ran a crossover from one server to the other and set the speed to 10mb. Once I change the Primary and secondary to the new IP's everything started working right away.

Is there a Min bandwidth needed from replication?

mikebounds's picture

I don't think there is any minimum bandwidth, but if there are lots of timeouts, then VVR will disconnect which is what you seem to be getting.  If the packet size used by VVR is too big for network so it gets broken up by switches I have seen links much quicker than 10Mbits go to a crawl as there are a lot of timeouts, so if you only have 512Kbits and you are getting lots of timeouts then this could be why rlink is disconnecting.  I would run:

vxrlink -g diskgroup stats rlink_name

to see if you are getting a lot of timeouts (use vxprint -P to get name of rlink)

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

mhab11's picture

Funny you say that, I had tried that command yesterday and received an error.

C:\Users\administrator.AVMAIL>vxprint -lPV

Diskgroup = BasicGroup

Diskgroup = AvMailDiskGroup

Rvg        : nycv
state      : state=ACTIVE kernel=ENABLED
assoc      : datavols=D:
             srl=\Device\HarddiskDmVolumes\AvMailDiskGroup\rep
             rlinks=rlk_144_6740
att        : rlinks=rlk_144_6740
checkpoint :
flags      : primary enabled attached clustered

Rlink      : rlk_144_6740
info       : timeout=98 packet_size=1100
             latency_high_mark=10000 latency_low_mark=9950
             bandwidth_limit=1000
state      : state=ACTIVE
             synchronous=override latencyprot=off srlprot=autodcm
assoc      : rvg=nycv
             remote_host=10.10.1.2
             remote_dg=AvMailDiskGroup
             remote_rlink=rlk_mhsws001anp-1_18231
             local_host=10.10.1.1
protocol   : UDP/IP
flags      : write attached consistent connected

Diskgroup = MHSWS001ANP-1-Dg0

C:\Users\administrator.AVMAIL>vxrlink -g avmaildiskgroup startstats rlk_mhsws001
anp-1_18231
Failed to perform the operation.
Error V-107-58644-914: RLINK name is not valid.
 

 

mikebounds's picture

 

You have used the name of the remote rlink, not the local link so need to use:
 
vxrlink -g AvMailDiskGroup stats rlk_144_6740 
 
when runnign command from 10.10.1.1 (and use rlink rlk_mhsws001anp-1_18231 when running command on 10.10.1.2)
 
A few other points:
  1. Your packet size is 1100 - is this the default size in Windows or  have you already tried to change this?
  2. The bandwidth_limit is 1000 - have you tried with bandwidth = none
  3. Your rlink names are not very meaningful and look like default names.  I would advise naming these yourself.  If you are using "vxrds addsec" to create RDS, then use prlink= and srlink= or else in the GUI I am sure you can put in your own names.  I would advise names like "to_london" and "to_newyork"

Also, note on a previous point where you asked "Is there a Min bandwidth needed from replication?" I have found out that in UNIX the minimum you can set the bandwidth_limit to is 56kbps (https://sort.symantec.com/public/documents/sfha/6.0/solaris/manualpages/html/man/volume_manager/html/man1m/vxedit.1m.html) but for Windows this is 1Mbps (see Windows VVR admin guide page 327).  So this implies 512kpbs would definaltly work on UNIX, but I hadn't realised until your last post that you were using Windows. This does not necessarily mean that anything less that 1Mbs will not work on WIndows, it could be you just can't set the bandwidth_limit less than this, but I don't know why there is so much difference between UNIX and Windows as to what you can set the bandwidth_limit to as the difference between 56kbps and 1Mbps is huge.

Have you tried using protocol=TCP yet? - I often find this helps when UDP has issues - you can set this retrospectively using vxedit (using vxrds probably won't work if it can't connect) or you can specify protocol=TCP when you create secondary using "vxrds addsec" (or I guess you probably have this option in the GUI)

Mike

 

 

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

mhab11's picture

Ok, didnt realize I couldn't check the remote, yes the local was fine.When I run the command from the secondary server I receive an error that the command can only be run from the Primary.

1. The default size for UDP was 1400 I change to 1100 to see if it got better.

2. I set the bandwith limit to 1000 after having it at max for a few days.

3. As I am just trying to get the initial steps for setup figured out I wasn't worried about the rlink name. Once I have a set plan I will rebuild everything to make sure the process is correct.

 

Again with Chapter 8 :) that was the only chapter I didnt print out. I am going to do that this morning.

I did switch to TCP, once I stopped and start the replication process everything moved over. But again I am back to when I add somthing new or remove something my network monitor shows activity but no changes are made to the secondary. Is there a setting I missed? I did have an error at one point that pointed to the SwiftSync issue. Do you think this may have somthing to do with my data not replicating?

mikebounds's picture

If by "no changes are made to the secondary" you mean you tried to change to TCP and the protocol changed on primary, but not on the secondary, this will because vxrds (and maybe the GUI) will fail to change the secondary if you are having connect problems, so you will need to use vxedit to change the protocol on the secondary.

If by "no changes are made to the secondary" you mean you add a file on primary while volume is mounted readonly on secondary and you don't see the change, then this is normal, as you shouldn't really mount secondary, but if replication is working and you remount secondary, then you should see the change. 

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

SOLUTION
mhab11's picture

This:

"If by "no changes are made to the secondary" you mean you add a file on primary while volume is mounted readonly on secondary and you don't see the change, then this is normal, as you shouldn't really mount secondary, but if replication is working and you remount secondary, then you should see the change."

Yep, I removed the drive letter then readded and the data was there. So when I am doing my testing i should leave the drive with no drive letter unless i need to see somthing?

mikebounds's picture

The proper way to check your secondary data is valid is to take a snapshot and mount the snapshot, or of course you can do a migration of roles to make your secondary writable so that you can mount.  Mounting (giving a drive letter to) your secondary read-only is not supported so your should not normally have a drive letter assigned at the secondary, but it is ok for reassurance during testing, but after a while you will trust VVR, as if VVR says it is up-to-date, I have never known this not to be the case, so VVR should tell you if it does not have the data by saying it has "X bytes in the SRL or DCM"

So to summarise, are you saying that you have issues of rlink disconnecting when using UDP mode, but it works ok using TCP?

Mike

UK Symantec Consultant in VCS, GCO, SF, VVR, VxAT on Solaris, AIX, HP-ux, Linux & Windows

If this post has helped you, please vote or mark as solution

mhab11's picture

"So to summarise, are you saying that you have issues of rlink disconnecting when using UDP mode, but it works ok using TCP?"

A couple things appeared the be the problem in the begining.

1. my connection was set to 512, we now know 1mb is the min.

2. I had the drive mounted so I only say the initial change when I first setup the replication.

Now that I have a 1mb UDP appears to be fine so long as the drive is not mounted.

Thanks for all the help!