Video Screencast Help
Give us your opinion and win with Symantec! Please help us by taking this survey to tell us about your experience with Symantec Connect, so that we can continue to grow and improve.  Take the survey.

Error: 110 and Error: 42 on Replication

Created: 26 Feb 2010
Liliana Windver's picture

Hello,

Puredisk 6.6 and two SPA . We replicate the date from SPA1 to SPA2 every day. We replicate all agents data .
The replication succeeded several times we've got a failed one complaining about spool out of space even the "df -k" shows is is enough space:
Output from "Job Log"  for one of the replicated agents and df output bellow:

*** Start: Replication Prepare ***
The remote dataselection mirror for source dataselection 8 is: 6
*** Stop: Replication Prepare ***
Agent Jobstep analysis: exitcode 0, status 2, progress 100.

*** Supportability Summary ***
jobid = 894
jobstepid = 3998
agentid = 1106000000
hostname = shpd01
starttimejobstep = February 25, 2010, 4:00 am
endtimejobstep = February 25, 2010, 4:00 am
workflowstepname = Prepare Replication
status = SUCCESS

[2010-Feb-25 04:09:35 IST]Starting Replication.
[2010-Feb-25 04:09:35 IST]Start to create the Replication Task
[2010-Feb-25 04:09:35 IST]Replication Task created.
Source Application : PUREDISK Remote Office
Policy Id : 123
Source DSID : 8
Destination DSID : 6
Source AgentID : 6
Type of Replication : INCREMENTAL
Remote ContentRouter port : 10082
Delayed DO Max Queue Size : 4194304 bytes
Encryption enabled : YES
MBFind Statement : <?xml version="1.0" encoding="UTF-8"?><MBFindCollection/>
Bandwidth limit : <not defined>
JobId : 894
JobStep : 4011
URL Remote SPA : 1.12.61.86
Destination StoragepoolID : 884
Local StoragepoolID : 1106
[2010-Feb-25 04:09:35 IST]Init Replication Engine.
[2010-Feb-25 04:09:36 IST]ReplicationEngine initialized.
[2010-Feb-25 04:09:36 IST]Start of Replication init step.
[2010-Feb-25 04:09:36 IST]Stop of Replication init step.
[2010-Feb-25 04:09:36 IST]Updating Agent Mirror Data Lock Password if needed.
[2010-Feb-25 04:09:36 IST]Updating Agent Mirror Data Lock Password finished.
[2010-Feb-25 04:09:36 IST]Start forwarding actual content.
[2010-Feb-25 04:09:37 IST]Forwarding batchNumbers (Incremental):707-731
[2010-Feb-25 04:09:37 IST]Destination current routingtables 0000 ffff shpd02 0 are written to file /Storage/var/rt/884_894.current
[2010-Feb-25 04:09:37 IST]Destination recommended routingtables 0000 ffff shpd02 0 are written to file /Storage/var/rt/884_894.recommended
[2010-Feb-25 04:09:37 IST]Executing MBFind to batchnumber : 731
[2010-Feb-25 04:09:37 IST]Using MBFind <?xml version="1.0" encoding="UTF-8"?><MBFindCollection/>
[2010-Feb-25 04:09:37 IST]Using DSFind -i 8
[2010-Feb-25 04:10:01 IST]Starting multi-stream replication.
[2010-Feb-25 04:10:02 IST]Starting multi-stream replication with 4 stream(s)
[2010-Feb-25 04:10:02 IST]Successfully started stream 0
[2010-Feb-25 04:10:02 IST]Successfully started stream 1
[2010-Feb-25 04:10:02 IST]Successfully started stream 2
[2010-Feb-25 04:10:02 IST]Successfully started stream 3
[2010-Feb-25 4:10:07 IST][stream2] Forwarding data (NUMBER OF FINGERPRINTS in this batch:24)
[2010-Feb-25 4:10:07 IST][stream2] Info: Server is Version 6.6.0.29164, Protocol Version 6.6
[2010-Feb-25 4:10:07 IST][stream2] Error: 110 : Received an abort message: spool directory out of space: Could not store reference operation
[2010-Feb-25 4:10:07 IST][stream2] Error: 42 : __replicate_DO_refop_batch_process: could not receive reference reply message(s): aborted
[2010-Feb-25 4:10:07 IST][stream2] Error: 42 : __replicate_DO_refop_batch: Could not process reference operation batch for replication batch entry 0-23, cache: aborted
[2010-Feb-25 4:10:07 IST][stream2] Error: 42 : Could not send reference add operations for source DOs to destination storage pool: aborted
[2010-Feb-25 4:10:07 IST][stream2]
[2010-Feb-25 4:10:07 IST][stream2] Fatal error: zif_cr_replicate: could not process the replication batch: aborted in /opt/pdmbe/mgmtclass/ReplicationStream.php on line 210
[2010-Feb-25 04:10:07 IST]Stream 2 completed with exit value 255
[2010-Feb-25 04:10:08 IST]Replication will retry sending data for attempt number: 1 after sleeping 10 second(s).

.........................
The error are shown for stream 0 , stream 1 total of 10 retries.
The stream 3  never started

......

Any ideas?

Regards,

Liliana

[2010-Feb-25 04:18:34 IST]Replication has tried 10 time(s) to replicate data, but was not successful.[2010-Feb-25 04:18:34 IST]Checking the execution status of each remote MBImport Job.[2010-Feb-25 04:18:34 IST]The batchnumber could not be increased, Failing the replication Job.[2010-Feb-25 04:18:35 IST]Statistics on SOURCE connection:
uptime = 0
bytes_transferred = 0
bytes_received = 0
messages_transferred = 0
messages_received = 0
seconds_in_transfer = 0
seconds_in_receive = 0
data_bytes_transferred = 0
data_bytes_received = 0
data_seconds_in_transfer = 0
data_seconds_in_receive = 0
message_bytes_transferred = 0
message_bytes_received = 0
message_seconds_in_transfer = 0
message_seconds_in_receive = 0
[2010-Feb-25 04:18:35 IST]Statistics on DESTINATION connection:
uptime = 0
bytes_transferred = 0
bytes_received = 0
messages_transferred = 0
messages_received = 0
seconds_in_transfer = 0
seconds_in_receive = 0
data_bytes_transferred = 0
data_bytes_received = 0
data_seconds_in_transfer = 0
data_seconds_in_receive = 0
message_bytes_transferred = 0
message_bytes_received = 0
message_seconds_in_transfer = 0
message_seconds_in_receive = 0
[2010-Feb-25 04:18:35 IST]Statistics on from Meta Data (PO-objects):
po_replicated_success = 0
po_new_source = 0
po_deleted_source = 0
po_modified_source = 0
po_bytes_replicated_success = 0
po_bytes_new_source = 0
po_bytes_deleted_source = 0
po_bytes_modified_source = 0
[2010-Feb-25 04:19:00 IST]Stop forwarding actual content.
[2010-Feb-25 04:19:00 IST]Start finalizing Replication.
[2010-Feb-25 04:19:00 IST]Stop finalizing Replication.
[2010-Feb-25 04:19:00 IST]Stopping Replication.

Agent Jobstep analysis: exitcode 1, status 3, progress 0.
*** Supportability Summary ***
jobid = 894
jobstepid = 4011
agentid = 1106000000
hostname = shpd01
starttimejobstep = February 25, 2010, 4:09 am
endtimejobstep = February 25, 2010, 4:19 am
workflowstepname = Forward Data
status = ERROR
Execute WFAction: Mark Error

 
*** Supportability Summary ***
jobid = 894
jobstepid = 4026
agentid = 1106000000
hostname = shpd01
starttimejobstep = February 25, 2010, 4:19 am
endtimejobstep = February 25, 2010, 4:19 am
workflowstepname = Error
status = SUCCESS
Execute WFAction: Exit
Job exited with 1 errors, 0 warnings, 3 successes
*** Supportability Summary ***
jobid = 894
jobstepid = 4027
agentid = 1106000000
hostname = shpd01
starttimejobstep = February 25, 2010, 4:19 am
endtimejobstep = February 25, 2010, 4:19 am
workflowstepname = Exit
status = SUCCESS

Shpd01

Last login: Mon Feb 22 12:34:46 2010 from eliaf-7.clalit.org.il
shpd01:~ # df  -h
Filesystem            Size  Used Avail Use% Mounted on
rootfs                123G  2.3G  115G   2% /
udev                  2.0G  144K  2.0G   1% /dev
/dev/disk/by-id/cciss-3600508b1001030364537413446300a00-part3
                      123G  2.3G  115G   2% /
tmpfs                 4.0K     0  4.0K   0% /dev/vx
/dev/disk/by-id/cciss-3600508b1001030364537413446300a00-part1
                       99M   14M   80M  15% /boot
/dev/disk/by-id/cciss-3600508b1001030364537413446300b00-part1
                      932G  319G  613G  35% /Storage
shpd01:~ #


shpd02

shpd02:~ # df -h
Filesystem            Size  Used Avail Use% Mounted on
rootfs                 56G  2.2G   51G   5% /
udev                  2.0G  144K  2.0G   1% /dev
/dev/disk/by-id/cciss-3600508b1001037373020202020200002-part3
                       56G  2.2G   51G   5% /
tmpfs                 4.0K     0  4.0K   0% /dev/vx
/dev/disk/by-id/cciss-3600508b1001037373020202020200002-part1
                       99M   14M   80M  15% /boot
/dev/disk/by-id/cciss-3600508b1001037373020202020200003-part1
                      932G  291G  642G  32% /Storage
shpd02:~ #