Video Screencast Help
Search Video Help Close Back
to help
Not able to make it to Vision this year? Get a sampling in the Best of Vision on Demand group.

HW or NBU issues??

Created: 02 Jul 2010 | 6 comments
quebek's picture
0 0 Votes
Login to vote

Hello
Recently I exchange the LTO2 tape drives to brand new LTO4 with new LTO4 media. And now almost half of new tapes are being frozen by NBU. In BPTM logs I do see such entries:
01:01:09.058 [3473466] <2> send_MDS_msg: DEVICE_STATUS 1 438127 taqa1d01 L00029 4000900 Drive5 2000056 TAPE_ALERT 872579072 0
01:01:09.078 [3473466] <2> log_media_error: successfully wrote to error file - 07/01/10 01:01:09 L00029 2 TAPE_ALERT Drive5 0x34028000 0x00000000
01:01:09.079 [3473466] <2> db_error_add_to_file: dberrorq.c:midnite = 1277960400
01:01:09.088 [3473466] <8> process_tapealert: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive Drive5 (index 2), Media Id L00029
01:01:09.088 [3473466] <2> db_error_add_to_file: dberrorq.c:midnite = 1277960400
01:01:09.099 [3473466] <16> process_tapealert: TapeAlert Code: 0x04, Type: Critical, Flag: MEDIA, from drive Drive5 (index 2), Media Id L00029
01:01:09.099 [3473466] <2> db_error_add_to_file: dberrorq.c:midnite = 1277960400
01:01:09.109 [3473466] <16> process_tapealert: TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive Drive5 (index 2), Media Id L00029
01:01:09.109 [3473466] <2> db_error_add_to_file: dberrorq.c:midnite = 1277960400
01:01:09.130 [3473466] <8> process_tapealert: TapeAlert Code: 0x0f, Type: Warning, Flag: MIC FAILURE, from drive Drive5 (index 2), Media Id L00029
01:01:09.130 [3473466] <2> db_error_add_to_file: dberrorq.c:midnite = 1277960400
01:01:09.142 [3473466] <8> process_tapealert: TapeAlert Code: 0x11, Type: Warning, Flag: READ ONLY, from drive Drive5 (index 2), Media Id L00029
01:01:09.175 [3473466] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006

L00029 1 6 07/01/2010 00:14 07/01/2010 00:14 hcart 154415443 0
MPX 07/11/2010 00:14 N/A FROZEN

NBU 6.5.1 is running on AIX 5.3. Tapes are
Manufacturer................IBM
Machine Type and Model......ULTRIUM-TD4
Serial Number...............xxxxxxxx
Device Specific.(FW)........94D7
installed in L700.

What do you think ? Issues are related to:
a) tapes itself
b) tape drives
c) nbu
any clues are welcome.
Of course some backup jobs to some tapes are doing good. The above errors was seen for all drives, for various media.
Do you think it would be good idea to clean brand new drives, even where there is no tape alert about cleaning needed?
I checked the tape alert PDF and here is the desc for these errors:
3 Hard Error W
The operation has stopped because an error has occurred while reading or writing data which the drive cannot correct.
The drive had a hard read or write error
4 Media C
Your data is at risk:
1. Copy any data you require from this tape.
2. Do not use this tape again.
3. Restart the operation with a different tape.
Media can no longer be written/read, or performance is severely degraded
6 Write Failure C
The tape is from a faulty batch or the tape drive is faulty:
1. Use a good tape to test the drive.
2. If the problem persists, call the tape drive supplier helpline.
The drive can no longer write data to the tape
15 Memory Chip in Cartridge Failure W
The memory in the tape cartridge has failed, which reduces performance. Do not use the cartridge for further backup operations.
Memory chip failed in cartridge
17 Read Only Format W
You have loaded a cartridge of a type that is read-only in this drive.
The cartridge will appear as write-protected
Media loaded that is read-only format

All points to bad media - imho - but so many bad medias? and brand new!!

Comments

David McMullin's picture
02
Jul
2010
0 Votes 0
Login to vote

check firmware? I know there

check firmware?

I know there are firmware versions of LTO2 that format and write once, but when trying the second backup it fails to read the header of the tape properly and freezes the tape.

Try - quick erase one of these frozen tapes (if it does not have data on it) then retry backup. If that works, it might be this issue.

NBU 7.0.1 on Solaris 10
writing to EMC 4206 VTL
duplicating to LTO2 in SL8500
(Soon to be LTO5)
using ACSLS 7.3.1

quebek's picture
02
Jul
2010
0 Votes 0
Login to vote

Drives causing issues are LTO4

And I was told by vendor that the FW version I am having -94D7- is the most recent one!!

Andy Welburn's picture
02
Jul
2010
0 Votes 0
Login to vote

Can you try excluding NB totally?

i.e. write directly to one of the 'dodgy' tapes/drives from the OS?

Regards Andy

"It's not too late to panic ..."

quebek's picture
06
Jul
2010
0 Votes 0
Login to vote

tar is writing to tapes

without any issues - btw how to check for tape alerts if using tar? in errpt there are no errors in regards to tape drives.

I would like to mention that once I will unfreeze the previously frozen tapes all backups are doing fine, no tape alerts - all is just fine.
I am really lost.
Maybe Marianne sugestion that the newest FW is not welcome in all cases. I will try to download the supported one and try with it.
BTW do you have any expierience with downgrading the drive firmware?? I was always going forward - never backward. Is it possible?

Marianne van den Berg's picture
02
Jul
2010
0 Votes 0
Login to vote

Most recent is firmware is

Most recent is firmware is not always guaranteed to be problem-free. Try on of the tested firmware versions that is listed in the hardware compatibility guide: ftp://exftpp.symantec.com/pub/support/products/Net...

TapeAlerts are generated by the tape drive. Extract from NBU Admin Guide II:

Using TapeAlert
TapeAlert is a tape drive status monitor and message utility. The TapeAlert utility can detect tape quality problems, defects in tape drive hardware, and the need to clean drives. For the tape drives that support TapeAlert, the TapeAlert firmware monitors the drive hardware and the media. Error, warning, and informational states are logged on a TapeAlert log page. NetBackup writes TapeAlert conditions into:
■ The bptm log
■ The error log
■ The job details log
■ The system log on UNIX and Event Viewer on Windows
For more information, also see “Reactive cleaning (TapeAlert)” on page 194.

TapeAlert log codes
TapeAlert codes are derived from the T10 SCSI-3 Stream Commands standard. Refer to the device’s documentation for the list of codes that are supported by the device. TapeAlert checks for errors of the following types:
■ Recoverable read and write drive problems
■ Unrecoverable read and write drive problems
■ Hardware defects
■ Wrong or worn-out media
■ Expired cleaning tapes
■ Abnormal errors

A set of TapeAlert conditions are defined that can cause the media in use to be frozen. An additional set of conditions are defined that can cause a drive to be downed. Table 3-13 on page 192 describes the TapeAlert codes..

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

Rick Brown's picture
06
Jul
2010
0 Votes 0
Login to vote

Label the tape first

Can you label a tape first?