Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Netbackup with Deduplication Option

Created: 16 May 2013 | 6 comments

HI All,
I am getting a Media Write error on my NBU MSDP running on 7.5.0.4. It has happen many time already and Symantec support is not able to tell me the issue is. Apparently from the storage.log it was showing "disk space issue" however the MSDP usage is less than 30% of the total space.

I was give a command to rebuild db "crchk.pl --rebuild-crdb". but they still cannot find the root cause of the issue. Below is the log I found:

--------------------

May 17 03:14:27 ERR [2942]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR: could not extend relation 1663/16387/33696: No space left on device
HINT: Check free disk space.
CONTEXT: COPY objects2_0, line 3698945
May 17 03:14:27 ERR [2942]: 4: database query failed.
Query : CREATE UNIQUE INDEX idx_objects2_0_1368731667211169_764315522 ON objects2_0(key);
Called by : pqObjects2ChildCopyIndex
Reason : ERROR: current transaction is aborted, commands ignored until end of transaction block
May 17 03:20:35 ERR [3199]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR: could not extend relation 1663/16387/33702: No space left on device
HINT: Check free disk space.
CONTEXT: COPY objects2_1, line 3700840
May 17 03:20:35 ERR [3199]: 4: database query failed.
Query : CREATE UNIQUE INDEX idx_objects2_1_1368732035941078_3761745002 ON objects2_1(key);
Called by : pqObjects2ChildCopyIndex
Reason : ERROR: current transaction is aborted, commands ignored until end of transaction block
May 17 03:26:15 ERR [3456]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR: could not extend relation 1663/16387/33708: No space left on device
HINT: Check free disk space.
CONTEXT: COPY objects2_2, line 3700674
May 17 03:26:15 ERR [3456]: 4: database query failed.
Query : CREATE UNIQUE INDEX idx_objects2_2_1368732375305858_937651469 ON objects2_2(key);
Called by : pqObjects2ChildCopyIndex
Reason : ERROR: current transaction is aborted, commands ignored until end of transaction block
May 17 03:28:03 ERR [3713]: 4: pqObjects2ChildCopyIndex: Copy command failed.
Reason: ERROR: could not extend relation 1663/16387/33714: No space left on device
HINT: Check free disk space.
CONTEXT: COPY objects2_3, line 3700176
May 17 03:28:03 ERR [3713]: 4: database query failed.
Query : CREATE UNIQUE INDEX idx_objects2_3_1368732483560508_1665227463 ON objects2_3(key);
Called by : pqObjects2ChildCopyIndex
Reason : ERROR: current transaction is aborted, commands ignored until end of transaction block
May 17 03:28:03 ERR [4113]: 25004: TlogProcessLog: Commit to storage database failed: unknown error
May 17 03:28:10 WARNING [4113]: 25000: Transaction logs from /dedupe/AYMSDP/queue/partsorted-19363-19522-0.tlog to /dedupe/AYMSDP/queue/partsorted-19363-19522-5.tlog failed: TlogProcessLog: Commit to storage database failed: unknown error
Transaction will be retried.
May 17 03:28:10 INFO [4113]: WSRequestExt: submitting &request=4&login=agent_3_24432&passwd=********************************&action=newevent&data=EVENT%7Bversion%3A1%3Btype%3A1%3Bid%3A0%3Bdate%3A1368732490%3B%7BLEGACYEVENT%7Bpayload%3Asev%3D4%3Btype%3D1037%3Bmsg%3DTransaction%20logs%20from%20%2Fdedupe%2FAYMSDP%2Fqueue%2Fpartsorted-19363-19522-0.tlog%20to%20%2Fdedupe%2FAYMSDP%2Fqueue%2Fpartsorted-19363-19522-5.tlog%20failed%3A%20TlogProcessLog%3A%20Commit%20to%20storage%20database%20failed%3A%20unknown%20error%0A%20Transaction%20will%20be%20retried.%0A%3B%7D%7D
May 17 03:28:11 ERR [4113]: 25004: Queue processing failed five times in a row. Queue processing will be disabled and the CR will no longer accept new backup data. Please contact support immediately!
May 17 03:28:11 INFO [4113]: WSRequestExt: submitting &request=4&login=agent_3_24432&passwd=********************************&action=process&agentId=3&severity=6&errornumber=2000&application=spoold&source=Storage%20Manager&date=1368732491&description=Queue%20processing%20failed%20five%20times%20in%20a%20row.%20Queue%20processing%20will%20be%20disabled%20and%20the%20CR%20will%20no%20longer%20accept%20new%20backup%20data.%20Please%20contact%20support%20immediately%21%20%0A
May 17 03:28:11 INFO [4113]: 25004: CR is changing mode to: PUT=No DEREF=No SYSTEM=Yes STORAGED=No REROUTE=No COMPACTD=No RECOVERCRDB=No
May 17 03:28:11 INFO [4113]: Entered mode 'STORAGED=no'.
May 17 03:28:11 INFO [4113]: Task Manager : Initiate mode switch Store -> stopped
May 17 03:28:11 INFO [4113]: Task Manager : Initiate mode switch System -> normal
May 17 03:28:11 INFO [4113]: Task Manager : Initiate mode switch Dereference -> stopped
May 17 03:28:11 INFO [4113]: 25004: CR mode changed. After CRQP failure issue is fixed, CR mode could be changed back to normal manually by crcontrol or CR restart.

Operating Systems:

Comments 6 CommentsJump to latest comment

user022013's picture

We had an issue with our mdsp storage filling up yet plenty of space was available from within NetBackup.

Netbackup does not reflect the "actual" disk space available.  If you browse using explorer you can check for log files that may fill it up.

However,  run this command if you haven't already:

\\installpath\veritas\pdde crcontrol --dsstat.

Look for the space that requires compacting.  This can grow very large and can fill up all available space without showing up anywhere.  It should look like something below (see bold text).  Ours blew out to 10TB.

D:\Program Files\Veritas\pdde>crcontrol --dsstat

************ Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                  23.0T  22.1T   9.3T  12.7T  43%

Number of containers             : 94206
Average container size           : 108911664 bytes (103.87MB)
Space allocated for containers   : 10260132285650 bytes (9.33TB)
Space used within containers     : 9337391012298 bytes (8.49TB)
Space available within containers: 922741273352 bytes (859.37GB)
Space needs compaction           : 311206942925 bytes (289.83GB)
Reserved space                   : 1016116240384 bytes (946.33GB)
Reserved space percentage        : 4.0%
Records marked for compaction    : 10152429
Active records                   : 162195158
Total records                    : 172347587

Use "--dsstat 1" to get more accurate statistics

I hope this helps.

user022013's picture

Also,  we experience media write errors when compaction is running with the 100 switch.

x:\program files\veritas\pdde crcontrol --compactstart 100 0 1.

The msdp is too busy to write to the disk. 

ericmagallanes's picture

Hi guys,

Im still stuck with this issue, below is the dsstat info. Can anybopdy let me know why its asking to free up space considering used space is only 14%?

*********** Data Store statistics ************
Data storage      Raw    Size   Used   Avail  Use%
                   3.5T   3.3T 458.2G   2.9T  14%

Number of containers             : 4999
Average container size           : 99598809 bytes (94.98MB)
Space allocated for containers   : 497894450231 bytes (463.70GB)
Space used within containers     : 444114041837 bytes (413.61GB)
Space available within containers: 53780408394 bytes (50.09GB)
Space needs compaction           : 481168640 bytes (458.88MB)
Reserved space                   : 158329901056 bytes (147.46GB)
Reserved space percentage        : 4.1%
Records marked for compaction    : 22756
Active records                   : 14875321
Total records                    : 14898077

huanglao2002's picture

MSDP is more complex component than other,if you encount this issue you can escalate to backline.

watsons's picture

This error message indicates your Queue Processing process has failed many times. 

" Queue processing failed five times in a row. Queue processing will be disabled"

If you have run crchk before, what did support say about the crchk result? Push them for answer if you don't get one.

There could be some fingerprint files missing or corrupted, this would need a crchk to fix. So the crchk result is the essential one to tell us something. 

user022013's picture

We had issues with queue processing too.  Backline support searched the log file and removed the offending corrupted file and queue processing resumed.

Definitely go to backline.  They were very good at helping us out for this.

Sachin Chavan was the support guy who worked with me.  If you can't get to him directly tell the support worker you are working with to check with him.  He seems to be the MSDP expert.