Replication stopped after power failure in secondary
Hi,
I need help as replication has stopped after power failure on the secondary.
Primary
vxprint -htrg DATADG1
rv data_rvg 1 ENABLED ACTIVE primary 10 srl
rl mum_data_hyd data_rvg CONNECT FAIL 97.253.60.67 DATADG1 hyd_data_mum
root@mumdes01 # vxprint -Pl
Disk group: DATADG1
Rlink: mum_data_hyd
info: timeout=500 packet_size=8400 rid=0.1273
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=FAIL
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
remote_dg=DATADG1
remote_dg_dgid=1192705878.24.hyddes01
remote_rvg_version=21
remote_rlink=hyd_data_mum
remote_rlink_rid=0.1260
local_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
protocol: UDP/IP
flags: write enabled attached inconsistent cant_sync fail connected asynchronous dcm_logging resync_paused
Secondary
rv data_rvg 1 ENABLED ACTIVE secondary 10 srl
rl hyd_data_mum data_rvg CONNECT PAUSE 97.253.60.65 DATADG1 mum_data_hyd
root@hyddes01 # vxprint -Pl
Disk group: DATADG1
Rlink: hyd_data_mum
info: timeout=500 packet_size=8400 rid=0.1260
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=PAUSE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
remote_dg=DATADG1
remote_dg_dgid=1193300303.15.mumdes01
remote_rvg_version=21
remote_rlink=mum_data_hyd
remote_rlink_rid=0.1273
local_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
protocol: UDP/IP
flags: write enabled attached secondary_paused inconsistent cant_sync fail connected
Comments
Hello, It looks to be in
Hello,
It looks to be in different states, couple of observations though..
a) SRL has already overflown & Replication is in DCM logging mode now..
b) Primary rlink is in fail state, not sure if you are ok for a full sync ?, else we might need to recover thatwith some commands...
c) You are using UDP protocol
First thing I would recommend is following:
On Secondary:
a) # vxrlink -g DATADG1 resume hyd_data_mum
Once done, check if you are able to come out of CONNECT PAUSE state ? we are looking for achieveing CONNECT ACTIVE state....
b) If above doesn't make any difference, try restarting VVR daemons...
# /usr/sbin/vxstart_vvr stop
# /usr/sbin/vxstart_vvr start
On Primary:
a) Since rlink is in CONNECT FAIL state, I guess it has stale connections, If you are sure that underlying issue is fixed now (secondary is good & can replicate),,try following on primary:
a) Try restarting daemons
# /usr/sbin/vxstart_vvr stop
# /usr/sbin/vxstart_vvr start
Check if rlink can come out of CONNECT FAIL state, If not try below mentioned workaround....
b) toggling of protocol to clear stale connections..
# vradmin -g DATADG1 set data_rvg 97.253.60.65 protocol=TCP
# vradmin -g DATADG1 set data_rvg 97.253.60.65 protocol=UDP
Once done, check the status again...
If rlink comes in connect active state, then you will need to manually flush out the data
# vradmin -g DATADG1 resync data_rvg (from primary)
Check if DCM is reducing (from primary)
# vxrlink -g DATADG1 -i5 status mum_data_hyd
If above steps doesn't help, paste following outputs after above commands completed...
From Primary:
# vxprint -Pl
# vxprint -qhtg DATADG1 | egrep '^rv|^rl'
# vradmin -g DATADG1 -l printrvg data_rvg
# vradmin -g DATADG1 -l repstatus data_rvg
From Secondary:
# vxprint -Pl
# vxprint -qhtg DATADG1 | egrep '^rv|^rl'
# vradmin -g DATADG1 -l printrvg data_rvg
# vradmin -g DATADG1 -l repstatus data_rvg
Gaurav
PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
Hi gaurav, Thanks for the
Hi gaurav,
Thanks for the quick reply.
I have tried but it is still on the same state.
Here is wht i did.
On Secondary:
When trying to resume the PAUSE status it was giving some kernel error
so I restarted the vvr deamon but the status was same CONNECT PAUSE
After restarting i again tried to resume but it got stuck and was not coming to the prompt.
I opened another session to check the status and it was showing as
rv data_rvg 1 ENABLED ACTIVE secondary 10 srl
rl hyd_data_mum data_rvg CONNECT RESUMING 97.253.60.65 DATADG1 mum_data_hyd
I waited 15 mins but still it was stuck so i restarted the vvr deamons again which killed the command and dont know how but it became ACTIVE, when i checked the status
rv data_rvg 1 ENABLED ACTIVE secondary 10 srl
rl hyd_data_mum data_rvg CONNECT ACTIVE 97.253.60.65 DATADG1 mum_data_hyd
On Primary
I restarted and toggled the protocol but still the status is the same CONNECT FAIL
Here is the output you wanted.
Primary
root@mumdes01 # vxprint -Pl
Disk group: DATADG1
Rlink: mum_data_hyd
info: timeout=500 packet_size=8400 rid=0.1273
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=FAIL
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
remote_dg=DATADG1
remote_dg_dgid=1192705878.24.hyddes01
remote_rvg_version=21
remote_rlink=hyd_data_mum
remote_rlink_rid=0.1260
local_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
protocol: UDP/IP
flags: write enabled attached inconsistent cant_sync fail connected asynchronous dcm_logging resync_paused
root@mumdes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg 1 ENABLED ACTIVE primary 10 srl
rl mum_data_hyd data_rvg CONNECT FAIL 97.253.60.67 DATADG1 hyd_data_mum
root@mumdes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg 1 ENABLED ACTIVE primary 10 srl
rl mum_data_hyd data_rvg CONNECT FAIL 97.253.60.67 DATADG1 hyd_data_mum
root@mumdes01 # vradmin -g DATADG1 -l printrvg data_rvg
Replicated Data Set: data_rvg
Primary:
HostName: 97.253.60.65 <localhost>
RvgName: data_rvg
DgName: DATADG1
datavol_cnt: 10
srl: srl
RLinks:
name=mum_data_hyd, detached=off, synchronous=off
Secondary:
HostName: 97.253.60.67
RvgName: data_rvg
DgName: DATADG1
datavol_cnt: 10
srl: srl
RLinks:
name=hyd_data_mum, detached=off, synchronous=off
root@mumdes01 # vradmin -g DATADG1 -l repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
Host name: 97.253.60.65
RVG name: data_rvg
DG name: DATADG1
RVG state: enabled for I/O
Data volumes: 10
SRL name: srl
SRL size: 36.00 G
Total secondaries: 1
Secondary:
Host name: 97.253.60.67
RVG name: data_rvg
DG name: DATADG1
Rlink from Primary: mum_data_hyd
Rlink to Primary: hyd_data_mum
Configured mode: asynchronous
Latency protection: off
SRL protection: autodcm
Data status: inconsistent
Replication status: resync in progress (dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 0 Kbytes) (SRL protection logging)
Timestamp Information: N/A
Bandwidth Limit: N/A
Secondary
root@hyddes01 # vxprint -Pl
Disk group: DATADG1
Rlink: hyd_data_mum
info: timeout=500 packet_size=8400 rid=0.1260
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=data_rvg
remote_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
remote_dg=DATADG1
remote_dg_dgid=1193300303.15.mumdes01
remote_rvg_version=21
remote_rlink=mum_data_hyd
remote_rlink_rid=0.1273
local_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
protocol: UDP/IP
flags: write enabled attached inconsistent cant_sync fail connected
utils: t0=RESUME
root@hyddes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg 1 ENABLED ACTIVE secondary 10 srl
rl hyd_data_mum data_rvg CONNECT ACTIVE 97.253.60.65 DATADG1 mum_data_hyd
root@hyddes01 # vradmin -g DATADG1 -l printrvg data_rvg
Replicated Data Set: data_rvg
Primary:
HostName: 97.253.60.65
RvgName: data_rvg
DgName: DATADG1
datavol_cnt: 10
srl: srl
RLinks:
name=mum_data_hyd, detached=off, synchronous=off
Secondary:
HostName: 97.253.60.67 <localhost>
RvgName: data_rvg
DgName: DATADG1
datavol_cnt: 10
srl: srl
RLinks:
name=hyd_data_mum, detached=off, synchronous=off
root@hyddes01 # vradmin -g DATADG1 -l repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
Host name: 97.253.60.65
RVG name: data_rvg
DG name: DATADG1
RVG state: enabled for I/O
Data volumes: 10
SRL name: srl
SRL size: 36.00 G
Total secondaries: 1
Secondary:
Host name: 97.253.60.67
RVG name: data_rvg
DG name: DATADG1
Rlink from Primary: mum_data_hyd
Rlink to Primary: hyd_data_mum
Configured mode: asynchronous
Latency protection: off
SRL protection: autodcm
Data status: inconsistent
Replication status: resync in progress (dcm resynchronization)
Current mode: asynchronous
Logging to: DCM (contains 0 Kbytes) (SRL protection logging)
Timestamp Information: N/A
Bandwidth Limit: N/A
Hello, Can you try following
Hello,
Can you try following things on primary:
1.> Try to recover the rlink:
# vxrlink -g DATADG1 recover hyd_data_mum
Then check the status.
2.> If above does not work try detach and reattach the rlink on primary:
# vxrlink -g DATADG1 -f det hyd_data_mum
# vxrlink -g DATADG1 -a att hyd_data_mum
Check the status again. Hope this helps.
Regards,
Dev
Consulting Storage Foundation, VCS, VVR and CVM/CFS on Unix. If this post has helped you, please "Vote" or "Mark as Solution" as appropriate.
Hi Dev, Thanks for the
Hi Dev,
Thanks for the reply,
The problem was resolved after detaching and attaching the rlink in primary.
vxrlink -g DATADG1 -f det mum_data_hyd
vxrlink -g DATADG1 -a att mum_data_hyd
Rgds
Vicky
Would you like to reply?
Login or Register to post your comment.