Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Replication stopped after power failure in secondary

Updated: 21 May 2010 | 4 comments
negivky's picture
0 0 Votes
Login to vote
This issue has been solved. See solution.

Hi,

I need help as replication has stopped after power failure on the secondary.

Primary

vxprint -htrg DATADG1

rv data_rvg     1            ENABLED  ACTIVE   primary  10        srl
rl mum_data_hyd data_rvg     CONNECT  FAIL     97.253.60.67 DATADG1 hyd_data_mum

root@mumdes01 # vxprint -Pl
Disk group: DATADG1

Rlink:    mum_data_hyd
info:     timeout=500 packet_size=8400 rid=0.1273
          latency_high_mark=10000 latency_low_mark=9950
          bandwidth_limit=none
state:    state=FAIL
          synchronous=off latencyprot=off srlprot=autodcm
assoc:    rvg=data_rvg
          remote_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
          remote_dg=DATADG1
          remote_dg_dgid=1192705878.24.hyddes01
          remote_rvg_version=21
          remote_rlink=hyd_data_mum
          remote_rlink_rid=0.1260
          local_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
protocol: UDP/IP
flags:    write enabled attached inconsistent cant_sync fail connected asynchronous dcm_logging resync_paused

Secondary

rv data_rvg     1            ENABLED  ACTIVE   secondary 10       srl
rl hyd_data_mum data_rvg     CONNECT  PAUSE    97.253.60.65 DATADG1 mum_data_hyd

root@hyddes01 # vxprint -Pl
Disk group: DATADG1

Rlink:    hyd_data_mum
info:     timeout=500 packet_size=8400 rid=0.1260
          latency_high_mark=10000 latency_low_mark=9950
          bandwidth_limit=none
state:    state=PAUSE
          synchronous=off latencyprot=off srlprot=autodcm
assoc:    rvg=data_rvg
          remote_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
          remote_dg=DATADG1
          remote_dg_dgid=1193300303.15.mumdes01
          remote_rvg_version=21
          remote_rlink=mum_data_hyd
          remote_rlink_rid=0.1273
          local_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
protocol: UDP/IP
flags:    write enabled attached secondary_paused inconsistent cant_sync fail connected

discussion Filed Under:

Comments

Gaurav Sangamnerkar's picture
26
Nov
2009
0 Votes 0
Login to vote

Hello, It looks to be in

Hello,

It looks to be in different states, couple of observations though..

a) SRL has already overflown & Replication is in DCM logging mode now..
b) Primary rlink is in fail state, not sure if you are ok for a full sync ?, else we might need to recover thatwith some commands...
c) You are using UDP protocol

First thing I would recommend is following:

On Secondary:

a) #  vxrlink -g DATADG1 resume hyd_data_mum

Once done, check if you are able to come out of CONNECT PAUSE state ?  we are looking for achieveing CONNECT ACTIVE state....

b) If above doesn't make any difference, try restarting VVR daemons...

# /usr/sbin/vxstart_vvr stop
# /usr/sbin/vxstart_vvr start

On Primary:

a) Since rlink is in CONNECT FAIL state, I guess it has stale connections, If you are sure that underlying issue is fixed now (secondary is good & can replicate),,try following on primary:

a) Try restarting daemons

# /usr/sbin/vxstart_vvr stop
# /usr/sbin/vxstart_vvr start

Check if rlink can come out of CONNECT FAIL state, If not try below mentioned workaround....

b) toggling of protocol to clear stale connections..

# vradmin -g DATADG1 set data_rvg 97.253.60.65 protocol=TCP
# vradmin -g DATADG1 set data_rvg 97.253.60.65 protocol=UDP

Once done, check the status again...

If rlink comes in connect active state, then you will need to manually flush out the data

# vradmin -g DATADG1 resync data_rvg     (from primary)

Check if DCM is reducing (from primary)

# vxrlink -g DATADG1 -i5 status mum_data_hyd

If above steps doesn't help, paste following outputs after above commands completed...

From Primary:

# vxprint -Pl
# vxprint -qhtg DATADG1 | egrep '^rv|^rl'
# vradmin -g DATADG1 -l printrvg data_rvg
# vradmin -g DATADG1 -l repstatus data_rvg

From Secondary:

# vxprint -Pl
# vxprint -qhtg DATADG1 | egrep '^rv|^rl'
# vradmin -g DATADG1 -l printrvg data_rvg
# vradmin -g DATADG1 -l repstatus data_rvg

Gaurav

PS: If you are happy with the answer provided, please mark the post as solution. You can do so by clicking link "Mark as Solution" below the answer provided.
 

negivky's picture
27
Nov
2009
0 Votes 0
Login to vote

Hi gaurav, Thanks for the

Hi gaurav,

Thanks for the quick reply.

I have tried but it is still on the same state.

Here is wht i did.

On Secondary:

When trying to resume the PAUSE status it was giving some kernel error
so I restarted the vvr deamon but the status was same CONNECT PAUSE

After restarting i again tried to resume but it got  stuck and was not coming to the prompt.
I opened another session to check the status and it was showing as

rv data_rvg     1            ENABLED  ACTIVE   secondary 10       srl
rl hyd_data_mum data_rvg     CONNECT  RESUMING 97.253.60.65 DATADG1 mum_data_hyd

I waited 15 mins but still it was stuck so i restarted the vvr deamons again which killed the command and dont know how but it became ACTIVE, when i checked the status

rv data_rvg     1            ENABLED  ACTIVE   secondary 10       srl
rl hyd_data_mum data_rvg     CONNECT  ACTIVE   97.253.60.65 DATADG1 mum_data_hyd

On Primary

I restarted  and  toggled the protocol but still the status is the same CONNECT FAIL

Here is the output you wanted.

Primary

root@mumdes01 # vxprint -Pl
Disk group: DATADG1

Rlink:    mum_data_hyd
info:     timeout=500 packet_size=8400 rid=0.1273
          latency_high_mark=10000 latency_low_mark=9950
          bandwidth_limit=none
state:    state=FAIL
          synchronous=off latencyprot=off srlprot=autodcm
assoc:    rvg=data_rvg
          remote_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
          remote_dg=DATADG1
          remote_dg_dgid=1192705878.24.hyddes01
          remote_rvg_version=21
          remote_rlink=hyd_data_mum
          remote_rlink_rid=0.1260
          local_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
protocol: UDP/IP
flags:    write enabled attached inconsistent cant_sync fail connected asynchronous dcm_logging resync_paused

root@mumdes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg     1            ENABLED  ACTIVE   primary  10        srl
rl mum_data_hyd data_rvg     CONNECT  FAIL     97.253.60.67 DATADG1 hyd_data_mum

root@mumdes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg     1            ENABLED  ACTIVE   primary  10        srl
rl mum_data_hyd data_rvg     CONNECT  FAIL     97.253.60.67 DATADG1 hyd_data_mum
root@mumdes01 # vradmin -g DATADG1 -l printrvg data_rvg
Replicated Data Set: data_rvg
Primary:
        HostName: 97.253.60.65  <localhost>
        RvgName: data_rvg
        DgName: DATADG1
        datavol_cnt: 10
        srl: srl
        RLinks:
            name=mum_data_hyd, detached=off, synchronous=off
Secondary:
        HostName: 97.253.60.67
        RvgName: data_rvg
        DgName: DATADG1
        datavol_cnt: 10
        srl: srl
        RLinks:
            name=hyd_data_mum, detached=off, synchronous=off

root@mumdes01 # vradmin -g DATADG1 -l repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
  Host name:                  97.253.60.65
  RVG name:                   data_rvg
  DG name:                    DATADG1
  RVG state:                  enabled for I/O
  Data volumes:               10
  SRL name:                   srl
  SRL size:                   36.00 G
  Total secondaries:          1

Secondary:
  Host name:                  97.253.60.67
  RVG name:                   data_rvg
  DG name:                    DATADG1
  Rlink from Primary:         mum_data_hyd
  Rlink to Primary:           hyd_data_mum
  Configured mode:            asynchronous
  Latency protection:         off
  SRL protection:             autodcm
  Data status:                inconsistent
  Replication status:         resync in progress (dcm resynchronization)
  Current mode:               asynchronous
  Logging to:                 DCM (contains 0 Kbytes) (SRL protection logging)
  Timestamp Information:      N/A
  Bandwidth Limit:            N/A

Secondary

root@hyddes01 # vxprint -Pl
Disk group: DATADG1

Rlink:    hyd_data_mum
info:     timeout=500 packet_size=8400 rid=0.1260
          latency_high_mark=10000 latency_low_mark=9950
          bandwidth_limit=none
state:    state=ACTIVE
          synchronous=off latencyprot=off srlprot=autodcm
assoc:    rvg=data_rvg
          remote_host=97.253.60.65 IP_addr=97.253.60.65 port=4145
          remote_dg=DATADG1
          remote_dg_dgid=1193300303.15.mumdes01
          remote_rvg_version=21
          remote_rlink=mum_data_hyd
          remote_rlink_rid=0.1273
          local_host=97.253.60.67 IP_addr=97.253.60.67 port=4145
protocol: UDP/IP
flags:    write enabled attached inconsistent cant_sync fail connected
utils:    t0=RESUME

root@hyddes01 # vxprint -qhtg DATADG1 | egrep '^rv|^rl'
rv data_rvg     1            ENABLED  ACTIVE   secondary 10       srl
rl hyd_data_mum data_rvg     CONNECT  ACTIVE   97.253.60.65 DATADG1 mum_data_hyd

root@hyddes01 # vradmin -g DATADG1 -l printrvg data_rvg
Replicated Data Set: data_rvg
Primary:
        HostName: 97.253.60.65
        RvgName: data_rvg
        DgName: DATADG1
        datavol_cnt: 10
        srl: srl
        RLinks:
            name=mum_data_hyd, detached=off, synchronous=off
Secondary:
        HostName: 97.253.60.67  <localhost>
        RvgName: data_rvg
        DgName: DATADG1
        datavol_cnt: 10
        srl: srl
        RLinks:
            name=hyd_data_mum, detached=off, synchronous=off

root@hyddes01 # vradmin -g DATADG1 -l repstatus data_rvg
Replicated Data Set: data_rvg
Primary:
  Host name:                  97.253.60.65
  RVG name:                   data_rvg
  DG name:                    DATADG1
  RVG state:                  enabled for I/O
  Data volumes:               10
  SRL name:                   srl
  SRL size:                   36.00 G
  Total secondaries:          1

Secondary:
  Host name:                  97.253.60.67
  RVG name:                   data_rvg
  DG name:                    DATADG1
  Rlink from Primary:         mum_data_hyd
  Rlink to Primary:           hyd_data_mum
  Configured mode:            asynchronous
  Latency protection:         off
  SRL protection:             autodcm
  Data status:                inconsistent
  Replication status:         resync in progress (dcm resynchronization)
  Current mode:               asynchronous
  Logging to:                 DCM (contains 0 Kbytes) (SRL protection logging)
  Timestamp Information:      N/A
  Bandwidth Limit:            N/A

 

Dev Roy's picture
04
Dec
2009
1 Vote +1
Login to vote

Hello, Can you try following

Hello,

Can you try following things on primary:

1.> Try to recover the rlink:
# vxrlink -g DATADG1 recover hyd_data_mum

Then check the status.

2.> If above does not work try detach and reattach the rlink on primary:
# vxrlink -g DATADG1  -f det hyd_data_mum
# vxrlink -g DATADG1  -a att hyd_data_mum

Check the status again. Hope this helps.

Regards,
Dev

Consulting Storage Foundation, VCS, VVR and CVM/CFS on Unix. If this post has helped you, please "Vote" or "Mark as Solution" as appropriate.

negivky's picture
03
Dec
2009
0 Votes 0
Login to vote

Hi Dev, Thanks for the

Hi Dev,

Thanks for the reply,

The problem was resolved after detaching and attaching the rlink in primary.

vxrlink -g DATADG1 -f det mum_data_hyd
vxrlink -g DATADG1 -a att mum_data_hyd

Rgds
Vicky