Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

frozen tapes, what to do

Created: 30 Aug 2005 • Updated: 21 May 2010 | 11 comments

I am accumulating a lot of frozen tapes. I'm not sure what to do about them.
I am using netbackup 4.5 fp6 on solaris 9. I have an L700 library. Storagetek will be coming out to upgrade the firmware on the drives,
but even if that reduces the number of newly frozen tapes, that doesn't
tell me what to do about the current frozen ones.

I have been unfreezing them. A significant number are frozen again
however.

The storagetek engineer told me once to set an eof on such tapes, but
I'm wondering about the proper procedure. any suggestions?

thanks

Comments 11 CommentsJump to latest comment

Stumpr2's picture

There are differing reasons for frozen tapes. This document was written to help in this situation.
DOCUMENTATION: How to troubleshoot frozen media on UNIX and Windows
http://seer.support.veritas.com/docs/249632.htm

Manual:
NetBackup (tm) 4.x, 5.x Troubleshooting Guide for UNIX and Windows
NetBackup (tm) 4.x, 5.x Media Manager System Administrator's Guide for UNIX and Windows

Page: N/A

Modification Type: Addition

Modification:


When troubleshooting frozen media issues, it is important to understand the following facts relative to this problem:


The actual FROZEN status is stored in the Media Database (MediaDB) of the media server that froze the media. Every media server (including the master) has its own unique Media Database (MediaDB).
MediaDB information, including the media status (Frozen, Full, Active), can be obtained using the bpmedialist command.
Media can only be unfrozen by using the bpmedia command. When media is unfrozen, the media server containing that frozen record should be specified in the command syntax. Media must be unfrozen one at a time.
A media being frozen does not necessarily mean that the media in question is defective. Freezing media is a safety measure taken by the NetBackup application to help prevent further errors, drive damage or possible data loss.
Investigate if there is any pattern to the media ID(s), tape drive(s) or media server(s) involved when media are frozen
·

The following logs are useful when troubleshooting frozen media:

UNIX:

The bptm log from the media server(s) that froze the media: /usr/openv/netbackup/logs/bptm
The Admin messages or syslog from the OS
·
Windows

The bptm log from the media server(s) that froze the media: \VERITAS\NetBackup\logs\bptm
The Windows Event Viewer System Log
The Windows Event Viewer Application Log
·
Note: It is preferable to have bptm enabled at a verbosity of 5 to troubleshoot any media and drive related issues. The bptm process log does not tend to take up excessive drive space or resources, even at an elevated verbosity. When a media is frozen, the bptm logs may contain more detailed information on why the media was frozen that the Activity Monitor or Problems Report does not state. Verbosity on bptm must be increased for every media server individually by changing its logging levels under Host Properties in the Administration console.

The following Status Codes can cause, or be a result of frozen media:
Status Code Reason
84 - Media Write Error If the tape unit can not read or write to the tape correctly, this status code can occur when media are frozen
86 - Media Position Error If the tape unit can not read or write to the tape correctly, this status code can occur when media are frozen
96 - Unable to allocate new media If media continue to become frozen, the backup job may end in a Status 96, because no more media available to mount.



The following are 5 common situations in which media become frozen:

1. The same media has excessive errors during backup

FREEZING media id E00109, it has had at least 3 errors in the last 12 hour(s)

Common causes and resolutions for this include:

Dirty Drives. Clean the drives(s) that are freezing media. One of the first symptoms seen with a dirty drive are often frozen media. Drive cleaning should be done according to the manufacturer's suggestions.
There may be an issue with the drive itself. Check the OS System logs mentioned above for any errors regarding tape devices, or errors reported by the driver for the tape device. If any are found, follow the hardware manufacturer's recommendations for this type of error.
There may be an issue with communication at the SCSI or Host Bus Adapter (HBA) level. Check the OS System logs mentioned above for any errors regarding SCSI or HBA devices, or errors reported by their driver. If any are found, follow the hardware manufacturer's recommendations for this type of error.
Ensure that the tape drive(s) being used appear on the hardware compatibility list as supported for NetBackup. This list is located on the VERITAS Software Technical Support Web site.
Ensure that the media being used is supported for use with the tape drive by the tape drive vendor
·

2. An unexpected media is found in the drive

Incorrect media found in drive index 2, expected 300349, found 200244, FREEZING 300349

This can occur under the following circumstances:

If NetBackup requests a media ID to be mounted in a drive and the media ID physically recorded on the tape is different than that NetBackup media ID, media will freeze. This can happen if the robot needs to be inventoried, if barcodes have been physically changed on the media, or if the media was previously written to by another NetBackup installation with different barcode rules.
The drives in the robot are not configured in order within NetBackup, or are configured with the wrong tape paths. Configuration of drives using the correct Robot Drive Number is important to the proper mounting and utilization of media. The Robot Drive Number, commonly set based on co-relation of the drive serial number with drive serial number information from the robotic library, should be determined and validated before the device configuration is considered complete.
·

3. The Media contains a non-NetBackup format

FREEZING media id 000438, it contains MTF1-format data and cannot be used for backups
FREEZING media id 000414, it contains tar-format data and cannot be used for backups
FREEZING media id 000199, it contains ANSI-format data and cannot be used for backups

These are usually tapes written outside of NetBackup that have found their way into the library. By default, NetBackup will only write to a blank media, or other NetBackup media. Other media types will be frozen as a safety measure. This behavior can be changed with the following procedure:

UNIX
To allow NetBackup to overwrite foreign media, add the following to /usr/openv/netbackup/bp.conf for the media server in question:

ALLOW_MEDIA_OVERWRITE = DBR
ALLOW_MEDIA_OVERWRITE = TAR
ALLOW_MEDIA_OVERWRITE = CPIO
ALLOW_MEDIA_OVERWRITE = ANSI
ALLOW_MEDIA_OVERWRITE = MTF1

Stop and restart the NetBackup daemons for the changes to take effect.

Windows
1. From the Administration Console, proceed to Host Properties | Media Server
2. Open the properties for the media server in question
3. Select the Media tab

The Allow Media Overwrite property overrides the NetBackup overwrite protection for specific media types. To disable overwrite protection, select one or more of the listed media formats.

Stop and restart the NetBackup services for the changes to take effect.

Caution: Do not select a foreign media type for overwriting unless it is certain that this media type should be overwritten. For more details on what each media type is, see the NetBackup System Administrator's Guide.


4. The media was formerly a tape used for the NetBackup Catalog Backup.

FREEZING media id 000067: it contains VERITAS NetBackup (tm) database backup data and cannot be used for backups.

This media was frozen because it is an old catalog backup tape, which NetBackup will not overwrite by default. In this case, the media must be labeled with the bplabel command to reset the media header. See the Related Document below on how to do this.

5. Media was intentionally frozen

It is possible to manually freeze media with the bpmedia command for a variety of administrative reasons. If frozen media are encountered and there is no record of a specific job freezing the media, media may have manually been frozen.


Unfreezing Frozen Media:

To unfreeze frozen media, use the bpmedia command with the following syntax:

For UNIX
/usr/openv/netbackup/bin/admincmd/bpmedia -unfreeze -m -h

For Windows
\VERITAS\Netbackup\bin\admincmd\bpmedia -unfreeze -m -h

If it is not known which media server froze the media, run the bpmedialist command and note the "Server Host:" listed in the output:

For UNIX
/usr/openv/netbackup/bin/admincmd/bpmedialist -m

For Windows
\VERITAS\Netbackup\bin\admincmd\bpmedialist -m

See the text illustration below for a sample output. In this example, bpmedialist is run for the frozen media div008. It is found in this example that the media server "denton" froze this media.

C:\Program Files\VERITAS\NetBackup\bin\admincmd>bpmedialist -m div008

Server Host = denton

ID rl images allocated last updated density kbytes restores
vimages expiration last read <------- STATUS ------->
----------------------------------------------------------------------------------------------------------

DIV008 1 1 04/22/2005 10:12 04/22/2005 10:12 hcart 35 5
1 05/06/2005 10:12 04/22/2005 10:25 FROZEN


Related Documents:

237779: How to reuse or recycle unnecessary VERITAS NetBackup database catalog tapes for normal backups
http://support.veritas.com/docs/237779


273908: In-depth Troubleshooting Guide for Exit Status Code 84 in NetBackup (tm) Server / Enterprise Server 5.0 / 5.1
http://support.veritas.com/docs/273908





Supplemental Material:

System: Ref.# Description
Error Code: 84 Media Write Error
Error Code: 86 Media Position Error
Error Code: 96 Unable to Allocate Media

VERITAS ain't it the truth?

Terry Furey's picture

D:\VERITAS\NetBackup\bin\admincmd>bpmedialist -m U00037
Server Host = prdljbu1

id rl images allocated last updated density kbytes restores

vimages expiration last read <------- STATUS ------->

--------------------------------------------------------------------------------

U00037 3 0 01/20/2005 19:08 N/A hcart2 0 0

0 N/A N/A FROZEN


D:\VERITAS\NetBackup\bin\admincmd>bpmedia -unfreeze -m U00037 -h prdljbu1

Stumpr2's picture

Dennis,

How is the frozen tape situation? Are you still getting frozen tapes? Have you been able to determine the cause? Do you have any further questions?Message was edited by:
Bob Stump

VERITAS ain't it the truth?

dennis sexton's picture

Frozen tape situation continues unabated. I have a case open with
Sun and have had storagetek guys out a couple of times. After updating
firmware on the tape drives, upgrading a couple of sbus cards
to the latest firmware, I am still freezing tapes at about the rate of one a day.

It was thought that I could remove the tapes in question. But at the
rate I'm freezing tapes, I can't duplicate the images fast enough :-(.
I also hit a glitch whereby my promoting the second copy to primary
doesn't seem to take. My current plan is to try to collect historical
data to see if I have any patterns.

So the answer is that I haven't been able to determine a cause.

And I guess my question is what do I do now?

I thank you Bob for your follow up. I hadn't noticed it and I apologize
for taking so long to respond.

Mark Kimball's picture

Do you mind posting relevant info from your bptm log and your status codes from the backup?

Dumb question... Are you sure you're using the sam types of LTO tapes?

dennis sexton's picture

For the first question, all tapes are the same. No mixing of any kind. Always ordered from the same vendor, same tape designation.

For the second part of the questiont, there is a lot of stuff. I hope I'm
not overwhelming this forum.

The job completes successfully after hiccuping on the first tape, according to bperr -jobid:

##> bperror -jobid 1675410
1128301747 1 4 4 master.EDU 1675410 1675407 0 client.edu bpsched added backup job (jobid=1675410) for client client.edu, policy
local_nonstandard, schedule Differential-Inc part 4 to NetBackup scheduler work queue
1128301800 1 4 4 master.EDU 1675410 1675407 0 client.edu bpsched started backup job for client client.edu, policy local_nonstandard, schedule Differential-Inc on storage unit L700
1128302236 1 132 16 master.EDU 1675410 1675407 0 client.edu bptm ioctl (MTWEOF) failed on media id 001353, drive index 9, I/O error (bptm.c.17692)
1128302236 1 132 16 master.EDU 1675410 1675407 0 client.edu bptm FROZE media id 001353, could not write tape mark to begin new image
1128302238 1 4 4 master.EDU 1675410 1675407 0 client.edu bptm begin writing backup id client.edu_1128301800, copy 1, fragment 1, to media id 000910 on drive index 4
1128302247 1 4 4 master.EDU 1675410 1675407 0 client.edu bptm successfully wrote backup id client.edu_1128301800, copy 1, fragment 1, 192 Kbytes at 21333.333 Kbytes/sec
1128302255 1 68 4 master.EDU 1675410 1675588 0 client.edu bpsched CLIENT client.edu POLICY local_nonstandard SCHED Differential-Inc EXIT STATUS 0 (the requested operation was successfully completed)

The bptm log for this job is 162 lines. I'll try to excerpt what looks like
the most important lines:

18:10:03.831 <2> bptm: INITIATING (VERBOSE = 5): -w -c client.edu -den 21 -rt 8 -rn 0 -stunit L700 -cl local_nonstandard -bt 1128301800 -b client.edu_1128301800 -st 1 -cj 12 -p NetBackup -hostname client.edu -ru root -rclnt client.edu -rclnthostname client.edu -rl 7 -rp 24105600 -sl Differential-Inc -ct 0 -v -mediasvr master.EDU -jobid 1675410 -jobgrpid 1675407 -masterversion 451000
18:10:05.377 <2> mount_open_media: Waiting for mount of media id 001353 (copy 1) on server master.EDU.
18:10:05.387 <2> fill_buffer: socket is closed, waited for empty buffer 0 times, delayed 0 times, read 196608 bytes
18:11:23.044 <2> io_open: SCSI RESERVE
18:11:32.031 <2> io_open: file /usr/openv/netbackup/db/media/tpreq/001353 successfully opened
18:11:32.031 <2> write_backup: media id 001353 mounted on drive index 9, drivepath /dev/rmt/7cbn, drivename Drive9, copy 1
18:11:32.033 <2> io_read_media_header: drive index 9, reading media header, buflen = 65536, buff = 0x262708, copy 1
18:11:32.033 <2> io_ioctl: command (5)MTREW 1 from (bptm.c.6471) on drive index 9
18:11:38.141 <2> io_ioctl: command (1)MTFSF 1 from (bptm.c.6662) on drive index 9
18:11:38.143 <2> io_position_for_write: position media id 001353, copy 1, current number images = 316
18:11:38.143 <2> io_position_for_write: locating to absolute block number 5586725, copy 1
18:15:59.152 <2> io_position_for_write: locate block is done
18:15:59.157 <2> io_position_for_write: processing empty header, filenum = 317, bid = (empty_file), copy 1
18:15:59.157 <2> io_position_for_write: empty header found on 001353, OK, copy 1
18:15:59.157 <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/001353, from bptm.c.17646
18:15:59.158 <2> io_open: SCSI RESERVE
18:15:59.163 <2> io_open: file /usr/openv/netbackup/db/media/tpreq/001353 successfully opened
18:15:59.163 <2> io_ioctl: command (2)MTBSF 1 from (bptm.c.17666) on drive index 9
18:15:59.265 <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17692) on drive index 9
18:17:16.409 <2> set_job_details: Sending Tfile jobid (1675410)
18:17:16.410 <2> set_job_details: LOG 1128302236 16 bptm 20558 ioctl (MTWEOF) failed on media id 001353, drive index 9,
I/O error (bptm.c.17692)
18:17:16.410 <2> set_job_details: Done
18:17:16.466 <16> io_ioctl: ioctl (MTWEOF) failed on media id 001353, drive index 9, I/O error (bptm.c.17692)
18:17:16.518 <2> log_media_error: successfully wrote to error file - 10/02/05 18:17:16 001353 9 WRITE_ERROR
18:17:16.518 <2> set_job_details: Sending Tfile jobid (1675410)
18:17:16.518 <2> set_job_details: LOG 1128302236 16 bptm 20558 FROZE media id 001353, could not write tape mark to begin new image
18:17:16.519 <2> set_job_details: Done
18:17:16.587 <16> write_backup: FROZE media id 001353, could not write tape mark to begin new image
18:17:16.638 <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/001353, from bptm.c.13617
18:17:16.676 <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/001353
18:17:16.732 <2> TpUnmountWrapper: SCSI RELEASE

The rest of the log for this job indicates the selection of another tape
and successful completion of the job. The job completed on drive 4.

I have run a histogram of errors on my 14 drives since mid

dennis sexton's picture

Sorry didn't finish my thought. A histogram of tape errors since Aug 23
indicates 146 errors total. Four drives have fewer than 5 errors, 6 drives
have greater than 10. I haven't confirmed that every one of these
resulted in a frozen tape, but I'm guessing a lot did.

Stumpr2's picture

So you have 10 drives in your L700 and all of them have had errors AND successful backups/restores?

I once had a box of bad tapes. I think the box may have been dropped. Tapes are not very forgiving. I also had SAN configuration/firmware problems that also complicated the situation. It took me a couple of months to eventually identify the bad tapes. I did this by creating a pool I called "cess_pool". I placed any frozen tapes into the cess_pool for identification/tracking. I then degaussed the tapes (they were DLT, don't try to degauss LTO's) and relabeled them with bplabel. I then placed the tapes into a "test_pool" and used them for a couple of weeks. I returned any tape that then had an error to the tape manufacturer for a refund.

The SAN and hardware/firmware issues also took a couple of weeks to correct. But that was with the older technology with SCSI routers and a screwy zoning.

You are doing well to check the logfiles and keeping historic failures. The logfile you posted spells out drive index #9 but in a SAN environment not all media servers use the same index number. Track all index numbers to the physical drive number.

I have a wonderful Storagetek engineer that helps me. I always go to him before I go to Sun for driver/configuration issues.

VERITAS ain't it the truth?

dennis sexton's picture

I actually have 14 drives. I left out the numbers for the middle ones, but
not a big deal.

I now have a new wrinkle. I have 3 tapes marked as suspended. How
does netbackup decide between freezing and suspending a tape?
It looks like the only difference is that a suspended tape is not "kept indefinitely", as a frozen tape is.

Shrikumar's picture

Hi ALL,

In one of my media server I am having 7 drives which are controlled by a acs server among these drives there are 2 drives in which whenever a media is mounted either the media get freezed or the drive goes down with the following error 

1) In case of media
Error bptm (pid=2751) ioctl (MTWEOF) failed on media id KX0221,drive 0,I/O error (bptm.c.20101)
Error bptm (pid=2751) FROZE media id KX0221,could not write tape mark to begin new

2)In case of drive
Error bptm (pid=27851) ioctl (MTWEOF) failed on media id KY0933,drive 0,I/O error (bptm.c.8736)
Warning bptm (pid=27851) DOWN'ing drive index 0 it has atleast 5 error in last 12 hours

==================================================
Please see the log message which i got from btpm

03:01:59.396 [8407] <16> io_ioctl: ioctl (MTWEOF) failed on media id KZ3090, drive index 0, I/O error (bptm.c.20101)
03:01:59.416 [8407] <2> log_media_error: successfully wrote to error file - 07/30/09 03:01:59 KZ3090 0 WRITE_ERROR
03:01:59.416 [8407] <2> set_job_details: Sending Tfile jobid (4594642)
03:01:59.416 [8407] <2> set_job_details: LOG 1248940919 16 bptm 8407 FROZE media id KZ3090, could not write tape mark to begin new image

03:01:59.416 [8407] <2> set_job_details: Done
03:01:59.418 [8407] <2> nb_getsockconnected: host=ksmtas service=bpdbm address=135.38.56.139 protocol=tcp non-reserved port=13721
03:01:59.418 [8407] <2> nb_bind_on_port_addr: bound to port 49326
03:01:59.419 [8407] <2> logconnections: BPDBM CONNECT FROM 135.38.26.20.49326 TO 135.38.56.139.13721
03:01:59.422 [8407] <2> vauth_authentication_required: vauth_comm.c.743: no methods for address: no authentication required
03:01:59.427 [8407] <2> vauth_connector: vauth_comm.c.177: no methods for address: no authentication required
03:01:59.427 [8407] <2> check_authentication: no authentication required
03:01:59.427 [8407] <2> vnet_check_vxss_client_magic_with_info: vnet_vxss_helper.c.774: Ignoring VxSS authentication: 2 0x00000002
03:01:59.427 [8407] <2> vnet_check_vxss_client_magic_with_info: vnet_vxss_helper.c.930: Not using VxSS authentication: 2 0x00000002
03:01:59.676 [8407] <16> write_backup: FROZE media id KZ3090, could not write tape mark to begin new image
03:01:59.689 [8407] <2> io_close: closing /usr/openv/netbackup/db/media/tpreq/KZ3090, from bptm.c.15839
03:01:59.689 [8407] <2> tpunmount: Check_for_waiting = 0, No_tpunmount_after_restore = 0, Media_Unmount_Delay = 0, MediaOffset = 1119
03:01:59.690 [8407] <2> tpunmount: tpunmount'ing /usr/openv/netbackup/db/media/tpreq/KZ3090
03:01:59.692 [8407] <2> TpUnmountWrapper: SCSI RELEASE
03:01:59.746 [8407] <2> nb_getsockconnected: host=ksmtas service=bpdbm address=135.38.56.139 protocol=tcp non-reserved port=13721 

Regards,
Shrikumar

Andy Welburn's picture

...this post is over 3 years old (nearly 4)!!

PS: dodgy drive, tried cleaning? ;)