BUG REPORT: The bptm write process may attempt to access a non existent shared memory file if job database cleanup runs longer then 5 minutes resulting in a status 12

Article:TECH66236  |  Created: 2008-01-22  |  Updated: 2009-01-15  |  Article URL http://www.symantec.com/docs/TECH66236
Article Type
Technical Solution


Environment

Issue



BUG REPORT: The bptm write process may attempt to access a non existent shared memory file if job database cleanup runs longer then 5 minutes resulting in a status 12

Solution



Bug: ET1272844 / ET1752343
Enhancement: ET1466476

Details:  If bpduplicate is running while bpdbjobs -clean is also running, and the duplication job attempts to process a new image, the read and write processes may get out of sync. This results in the write process attempting to read the shm file that has not been created yet by the read process.
The final result is a status 12 on the write process. The read process will remain active and will create the shared memory file. When the read process is terminated, a stale shm file may be left on the system which causes status 89's on subsequent attempts to run bpduplicate on the affected image.

Log file example using a disk duplication to tape; hence bptm is the write process and bpdm is the read process:

The admin log start on the master server:
10:11:01.353 [14878] <2> build_write_str: START BACKUP -jobid 2000392 -c clientA -b clientA_1202256093 -cl POLICY -shm -blksize 1048576 -bt 1202256093 -st 2 -rl 3 -date 1204934493 -ct 4 -cn 2 -use_vnetd media

bptm start:
10:11:01.353 [20603] <2> write_backup: got from bpdup (START BACKUP -jobid 2000392 -c clientA -b clientA_1202256093 -cl POLICY - shm -blksize 1048576 -bt 1202256093 -st 2 -rl 3 -date 1204934493 -ct 4 -cn 2 - use_vnetd -mtd), len = 189, err = 0

The bpdm log shows the start:
10:11:03.566 [8007] <2> bpdm: INITIATING (VERBOSE = 0): -copy -cmd -nosig - everything -cn 1 -c clientA -b clientA_1202256093 -port -1 -1 media_server - jobid 2000392 -shm -v -p /nbu/stunit388/04 -mediasvr media_server

Then the failure to read shm in bptm:
10:16:05.024 [20603] <16> setup_dup_shm: Could not open file /usr/openv/netbackup/db/config/shm/clientA_1202256093 to get shared memory information. Errno = 2: No such file or directory
10:16:05.038 [20603] <2> bptm: EXITING with status 12 <----------


The bpdm log shows that the shared memory file is created shortly there after the exit status 12:
10:17:04.637 [8007] <2> setup_bpbkar_info: /usr/openv/netbackup/db/config/shm/clientA_1202256093 file successfully created
10:17:04.637 [8007] <2> read_backup: copy 1, fragment 1 is the last fragment for duplicate


Once a duplication job exits in this manner, administrators need to manually clean up the shared memory file in order to be able to successfully duplicate this image. In the above example the shared memory file is named:
/usr/openv/netbackup/db/config/shm/clientA_1202256093

Workarounds:
The following work around's may be used to alleviate the problem:
1. Schedule duplication jobs to run at times that do not span midnight
2. Script bpdbjobs -clean to run via cron so  that cleanup will occur more frequently.  This will allow the over all time to it takes for clean-up to complete to be less then five minutes in most circumstances.
3. Contact Symantec Technical Support to request a binary for ET1752343 which has additional code added to attempt to open the shared memory file multiple times if the file does not exist.



ETA of Fix:
A product enhancement to ensure order integrity of the read and write process (ET1466476) has been logged and may be included in a future release of NetBackup.  
Symantec Corporation has acknowledged that the above mentioned issue (ET1752343) is present in the current version(s) of the product(s) mentioned at the end of this article. Symantec Corporation is committed to product quality and satisfied customers.  
This issue was scheduled to be addressed in the following release:
  • NetBackup 6.5 Release Update 6 (6.5.6)

When NetBackup 6.5.6 is release, please visit the following link for download and readme information:  http://www.symantec.com/enterprise/support/overview.jsp?pid=15143
Please note that Symantec Corporation reserves the right to remove any fix from the targeted release if it does not pass quality assurance tests or introduces new risks to overall code stability. Symantec's plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.



Supplemental Materials

SourceETrack
Value1466476
DescriptionEnhancement: bpdbjobs -cleanup running in conjuntion with bpduplicate may result in a status 12

SourceETrack
Value1272844
Descriptionbpdbjobs -cleanup running in conjunction with bpduplicate may result in a status 12 on disk images

SourceETrack
Value1633239
Descriptionduplication read and write processes starting out of sync causes hung dups and read side processes that don't die automatically

SourceETrack
Value1752343
DescriptionBinary with fixes for multiple hung duplication issues

Legacy ID



316802


Article URL http://www.symantec.com/docs/TECH66236


Terms of use for this information are found in Legal Notices