A rare data loss condition exists for HP-UX, AIX and SGI media servers running NetBackup 4.5, 5.0, 5.1, and 6.0 where the last valid backup image on tape is overwritten by an empty backup header due to a failed backup job.

Article:TECH50693  |  Created: 2007-01-08  |  Updated: 2013-10-23  |  Article URL http://www.symantec.com/docs/TECH50693
Article Type
Technical Solution

Product(s)

Environment

Issue



A rare data loss condition exists for HP-UX, AIX and SGI media servers running NetBackup 4.5, 5.0, 5.1, and 6.0 where the last valid backup image on tape is overwritten by an empty backup header due to a failed backup job.

Solution



Introduction:
A rare data loss condition exists for HP-UX, AIX and SGI media servers running NetBackup 4.5, 5.0, 5.1, and 6.0 where the last valid backup image on tape is overwritten by an empty backup header due to a failed backup job.  In rare cases, the bptm process for a failed or canceled job rewinds the tape too far, and overwrites the last valid backup with an empty backup header.  As a result, the next backup starts writing from the same position as the previous successful backup prior to the canceled job.  This behavior is not exhibited on all failed or canceled backups.  

As this issue is related to fast positioning and position checking of the tape drives, it is possible (but unconfirmed) this issue existed in previous versions of the NetBackup software.   For more information about tape formats used by NetBackup, refer to the following manuals:

-Veritas NetBackup 6.0 Media Manager System Administrator's Guide for UNIX - page 375
-Veritas NetBackup 5.1 Media Manager System Administrator's Guide for UNIX - page 366
-Veritas NetBackup 5.0 Media Manager System Administrator's Guide for UNIX - page 323
-Veritas NetBackup DataCenter 4.5 System Administrator's Guide for UNIX - page 795

Links to the Media Manager System Administrator's Guides can be found below, in the Related Documents section.

Note:  This issue has been formally resolved in the latest maintenance packs for NetBackup 5.0, 5.1, and 6.0.  Refer to the Related Documents section, below, for links to these packs.

What is Affected:
This issue is known to only affect media servers running the following versions and platforms of NetBackup:
- NetBackup 6.0 GA and all patch levels through Maintenance Pack 3 (MP3) running on supported HP-UX or AIX platforms.
- NetBackup 5.1 GA and all patch levels through Maintenance Pack 5 (MP5) running on supported HP-UX, AIX, or SGI platforms.
- NetBackup 5.0 GA and all patch levels through Maintenance Pack 6 (MP6) running on supported HP-UX, AIX, or SGI platforms.
- NetBackup 4.5 GA and all Maintenance Pack / Feature Pack patch levels running on supported HP-UX, AIX, or SGI platforms.

How to Determine if Affected:
Any backup type, including both True Image Restore (TIR) and multiplexed backups can be affected by this issue.  The process of duplicating or verifying media reads the actual contents of the media, and this process would immediately highlight any affected media.  If the environment in question does any of these activities and is not experiencing any issues, the media are not affected, and there is no risk of data loss.  Applying the patch or the workaround listed in the Formal Resolution will prevent any future problems.  

If the duplicating or verifying of the media generates a NetBackup Status Code 94 (cannot position to correct image) or Status Code 173 (cannot read backup header, media may be corrupted), then the media may have been affected by this issue.

The only way to determine with complete certainty if the environment is unaffected is to run a Media Contents report, which reads the actual contents of every piece of media.  The problem with this method is it may be too cumbersome and time consuming.  

Attached to this TechNote, and downloadable via the "Download Now" link below, is the NetBackup Image Checker (NBIC) utility which can determine with a high degree of certainty if the environment is affected.  Download this utility to the master server and uncompress the archive into the /usr/openv/netbackup/bin/goodies directory on a UNIX/Linux master or the <install_path>\veritas\netbackup\bin\goodies directory on a Windows master server.  While the binary does not need to be in this directory, Support recommends placing the files into this directory.  Then review the NBIC_readme file for instructions on the use and execution of this utility.  

The command output looks similar to the following:

NBIC -if bpimagelist-l

2.0  Reading NetBackup bpimagelist output
      Start time = 2007-02-01 12:39:15
        Reviewed 135714 images in Image DB
        Omitted 1112 disk based images from further analysis
        Selected 133974 media based images for further analysis
        15997 pieces of tape media were detected for analysis
      End time = 2007-02-01 12:45:22

3.0  Analyzing 15997 NetBackup tapes for possible image overwrites
      Start time = 2007-02-01 12:45:22
3.1  Analyzing tape image positions
      Detected image(s) on 3 tapes that need further analysis
3.2  Reading image information
      Start time = 2007-02-01 12:45:31
      Loading image information for tape # 1 - 608471
        client01_1164427629
      Loading image information for tape # 2 - 612610
        client011_1147653510
      Loading image information for tape # 3 - 612682
        client012b_1123380490

      End time = 2007-02-01 12:46:40

     ***NBIC DETECTED 3 images that were OVERWRITTEN!***

4.0  Printing image information
    End time = 2007-02-01 12:46:40


       Please review the NBIC results in the
       ./output/nbic-info.txt file.

If the utility does not report any overwritten images, and either the patch or the workaround is implemented, no additional steps are necessary.  


If the utility reports images overwritten, as shown above, the environment is affected and database clean up is required.  Review the nbic-info.txt file to determine if more than one copy of the overwritten image exists.  Each overwritten image has an individual section in the nbic-info.txt file which looks similar to the following (bold added for clarity):


Backup ID = client01_1164427629
 Client          = client01
 Client type     = Standard
 Policy          = full_unix_backup
 Schedule        = Full
 Schedule type   = Full Backup
 Backup time     = Thu Dec 28 03:46:26 2006
 Expiration time = Sat Dec 30 03:46:26 2006
   Key Copy# MediaID
   -->  1    608471

For each overwritten image, use the bpexpdate command (found in the /usr/openv/netbackup/bin/admincmd directory on UNIX/Linux servers and the <install_path>\veritas\netbackup\bin\admincmd directory on a Windows server) to expire these overwritten images.  In the example above, for the "client01_1164427629" image, the bpexpdate command is:

bpexpdate -backupid <backup_id> -d <date> -copy <copy_number>

bpexpdate -backupid client01_1164427629 -d 0 -copy 1

By specifying a zero for the date, the image is expired immediately.  


If you require assistance in the database cleanup process please do not hesitate to call Symantec Support.


Formal Resolution:
This issue is resolved in the following patches for NetBackup:

- For NetBackup 6.0, download and apply Maintenance Pack 4 (MP4), released on November 20, 2006.
- For NetBackup 5.1, download and apply Maintenance Pack 6 (MP6), released on December 13, 2006.
- For NetBackup 5.0, download and apply Maintenance Pack 7 (MP7), released on December 13, 2006.

The download links can be found below, in the "Related Documents" section of this TechNote.  The downloads are also available at the following link:    
 http://www.symantec.com/enterprise/support/downloads.jsp?pid=15143

Since NetBackup 4.5 and previous versions are End of Life, Symantec Corporation has no plans to release fixes for these versions of NetBackup.  If the formal resolution to this issue cannot be implemented in your environment immediately, or an End-of-Life version is running in your environment and you cannot upgrade to an unaffected version of NetBackup, Symantec strongly recommends implementing the workaround described in the next section of this TechNote.

Workaround:
Until the appropriate NetBackup patch can be installed, Symantec recommends the following empty file be created on all potentially affected media servers (as defined in the "What is Affect" section above).

# touch /usr/openv/volmgr/database/NO_LOCATEBLOCK

This avoids the problem code, but also disables "fast positioning" which can negatively impact backup and restore performance.  

To successfully restore non-overwritten images on affected media, it may be necessary to remove the NO_LOCATEBLOCK touch file.  

Please note that if the above situation exists in the NetBackup environment (the backup image is overwritten by the next backup), the data within the first backup is unrecoverable.  To prevent data loss, backup the overwritten data again, as this protects the data from loss.  

Best Practices
Symantec strongly recommends the following best practices:
1. Always perform a Full backup prior to and after any changes to your environment.
2. Always make sure that your environment is running the latest version and patch level.

How to Subscribe to Software Alerts
If you have not received this TechNote from the Symantec Email Notification Service as a Software Alerts, please subscribe at the following link:  
 http://maillist.entsupport.symantec.com/subscribe.asp




Attachments

1.3_nbic_287287.tar (7.4 MBytes)

Supplemental Materials

SourceETrack
Value594040
DescriptionA valid backup image is overwritten by the next job (writing empty header at the begining position of the last vaild backup)

SourceETrack
Value594039
DescriptionA valid backup image is overwritten by the next job (writing empty header at the begining position of the last vaild backup)

SourceETrack
Value594041
DescriptionA valid backup image is overwritten by the next job (writing empty header at the begining position of the last vaild backup)

SourceETrack
Value594036
DescriptionA valid backup image is overwritten by the next job (writing empty header at the begining position of the last vaild backup)

SourceETrack
Value704572
DescriptionA valid backup image is overwritten by the next job (writing empty header at the begining position of the last vaild backup)

SourceError Code
Value94
Descriptioncannot position to correct image

SourceError Code
Value173
Descriptioncannot read backup header, media may be corrupted


Legacy ID



287287


Article URL http://www.symantec.com/docs/TECH50693


Terms of use for this information are found in Legal Notices