Optimized duplications between NetBackup 7.1 Media Server Deduplication (MSDP) disk pools report media write error status 84 after hanging for 12 hours

Article:TECH160202  |  Created: 2011-05-17  |  Updated: 2011-11-10  |  Article URL http://www.symantec.com/docs/TECH160202
Article Type
Technical Solution


Environment

Issue



Optimized duplications (optdup) between NetBackup 7.1 Media Server Deduplication (MSDP) disk pools fail with status 84 after 12 hours despite pd.conf OPTDUP_TIMEOUT value of 18 hours.

This issue also affect the NetBackup 5200/5220 appliance, version 2.0.


Error



Messages similar to the following may be found in the NetBackup administration console Activity Monitor - Detailed Status:

 3/8/2011 10:01:19 PM - Critical bpdm(pid=4992) sts_copy_extent failed: error 2060005 object is busy, cannot be closed
 3/8/2011 10:01:20 PM - Critical bpdm(pid=4992) sts_copy_extent failed: error 2060001 one or more invalid arguments
 3/8/2011 10:01:53 PM - Critical bpdm(pid=4992) sts_close_handle failed: 2060022 software error
 3/8/2011 10:09:11 PM - Error bpduplicate(pid=5460) host mediaserver1.examplez.com backup id zserver21.examplez.com_1299265258 optimized duplication failed, media write error (84).
 3/8/2011 10:09:11 PM - Error bpduplicate(pid=5460) Duplicate of backupid zserver21.examplez.com_1299265258 failed, media write error (84).
 3/8/2011 10:09:11 PM - Error bpduplicate(pid=5460) Status = no images were successfully processed.
 3/8/2011 10:09:11 PM - end Duplicate; elapsed time: 12:19:29
 3/8/2011 10:09:11 PM - Info mediaserver1.examplez.com(pid=4992) StorageServer=PureDisk:mediaserver1.examplez.com; Report=PDDO Stats for (mediaserver1.examplez.com): scanned: 2 KB, stream rate: 0.00 MB/sec, CR sent: 0 KB, dedup: 100.0%, cache hits: 0 (0.0%) media write error(84)

The timeout value was confirmed (OPTDUP_TIMEOUT = 2160 [minutes, equal to 36 hours]) as configured and being honored by the pdplugin, as evidenced by this entry in the bpdm log:

18:49:23.829 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: tag_nbu_dsid: pdvfs_conf_optdup_timeout 2160


bpdm shows looping PdvfsGetJobBytesProcessed and PDVFS_IOCTL_GET_JOB_STATE
requests with 30 second sleeps, which go on and on:

18:53:35.030 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: check_pdvfs_job id=436 state=QUEUED
18:53:35.030 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: PDVFS: [4] PdvfsGetJobBytesProcessed: Calling  PdvfsGetJobBytesProcessed to retrieve number


...until bpdm reports the odd failure:

06:53:27.475 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: impl_async_wait sleeping for 30 seconds
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: async_get_job_status Enter
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: PDVFS: [4] PdvfsVIoctl: PDVFS_IOCTL_GET_JOB_STATE, ptr=0000000000122240, size=144
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: PDVFS: [4] pdvfs_get_job_state: enter jobId=436
...
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: PDVFS: [6] pdvfs_lib_log: Received binary message from server1.examplez.com:10102: REPLY 4195726551 2892 1: 0
...
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: PDVFS: [4] PdvfsGetJobBytesProcessed: Calling of PdvfsGetJobBytesProcessed was successful.
06:53:57.491 [5484.9840] <4> 11022:bptm:5484:server1.examplez.com: async_get_job_status number of bytes replicated, returned from SPAD 436: (0)
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: async_get_job_status Status for targetmediasvr.examplez.net is RUNNING
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: async_get_job_status Exit with job status RUNNING
06:53:57.491 [5484.9840] <2> 11022:bptm:5484:server1.examplez.com: impl_async_wait Exit, status 1
06:53:57.491 [5484.9840] <16> 11022:bptm:5484:server1.examplez.com: impl_copy_extent_direct impl_async_wait returned replication was non-successful: 2060005 (object is busy, cannot be closed)
 

Note:  This issue does not cause complete failure of every single optimized duplication:
1. Most backup ids (bids) successfully duplicate.
2. Even in a failed Vault/SLP driven optimized duplication, there will likely be multiple bids that successfully duplicate with only a few failures. In these jobs, the NetBackup Activity Monitor Detailed Status for the failed duplication will report successfully duplicated x of n images where n is less than the value of x.
3. Even the particular bids that had failed repeatedly eventually succeeded (although new ones may take their place).


Environment



NetBackup 7.1 Media Server Deduplication


Solution



Symantec has acknowledged that the above-mentioned issue (ETrack 2326388) is present in the current version(s) of the product(s) mentioned at the end of this article.

The formal resolution to this issue is included in the NetBackup 7.1.0.2 Maintenance Update, which is available via the link provided in the Related Articles section below.

This issue is also addressed in the 2.0.1 patch for NetBackup 5200/5220 appliances, available via the link provided in the Related Articles section below.


Supplemental Materials

SourceETrack
Value2326388
Description

NBU 7.1 msdp optdup hangs 99% till 12 hours pass, then fails status 84. bpdm indicates problems setting the _OPT_DUP_END.info: 'impl_async_wait Exit, status 1' '(object is busy, cannot be closed'




Article URL http://www.symantec.com/docs/TECH160202


Terms of use for this information are found in Legal Notices