Video Screencast Help

Backup Exec 2012 Tape drive Issues

Created: 07 Feb 2013 | 18 comments

Hi folks, we're using BUE2012 in our production and test environments, in our test environment, we keep running into the same issue again, and again, and again. We never had this problem previously and cannot determine what the cause is!

We have BUE2012 running with an IBM LTO4 Tape drive, an Ultrium -TD4. We are using the Symantec drivers and the drive is up to date on firmware.

We CANNOT for love, nor money, perform a successful catalog operation on a tape. for example:

Following a TN and other advice on this website, we removed the Catalogs folder and using 2 of our full backup tapes, we tried to inventory and catalog the tapes to preform a test restore. The tapes wouldn't catalog. At some point during the operation the Drive would be marked as offline. If you onlined it, it'd offline itself again. rinse Repeat.

This machine is used for nothing else, BUE is the only program that would be commanding the drive. Initially it happened when you tried ANYTHING tape related. We then discovered the Symantec drivers were not being used - installed those, removed the old catalogs incase they were corrupt and got 2 fresh full backup tapes to catalog and try a restore. No luck.

One tape finished that contained ~500GB of data but the full tape stopped after 1300GB/1600GB - a frustrating waste of 3 hours!

 

 

Comments 18 CommentsJump to latest comment

CraigV's picture

Hi,

 

IBM should have a tape utility you can download and run diagnostics against the drive. Do so ane make sure the device does a successful self-test and doesn't show any issues.

Does BE 2012 have SP1a installed along with any subsequent patches? Have you tried to repair the BEDB using BEutility.exe and/or installation of BE?

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

jfkana's picture

The drive itself works just fine, i've been able to make restores using it previously. I will of course run the diags to be sure.

BUE2012 has SP1a and all hot fixes LU would detect and install. I've currently reinstalled BUE2012 to start over totally fresh, likea  true bare metal scenario where it would just be installed, no database info, no catalogs, no prior information!

Hopefully it runs the cat job okay.

CraigV's picture

...and the tapes don't have excessive Hard Write Errors? No Removable Storage service involved from a Windows-perspective?

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

jfkana's picture

Zero hard errors of any description! No removeable storage to the best of my knowledge. Best method of double checking this?

Kiran Bandi's picture

Which OS is running on media server?

Incase of windows 2K3, check under services for Removable Storage Manager service, stop and disable it if it is running.

If 2K8, forget about it. RSM service doesn't exist. And the diagnostic tool for IBM devices is ITDT. You can get it from here: http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000662

jfkana's picture

OS is 2k8R2. I reinistalled BUE from fresh there, it got through one tape and then as soon as i tried to inventory and cat the 2nd tape it immediately offlined the drive.

I removed the drive + uninstalled drivers through Device Manager, rebooted the server, installed the symantec tape drivers through the tape install wizard on 2012. Rebooted again to make sure evrything was ok.

Currently catting the other tape in the set, this tape contains a full 1.6TB of information so it's expected to take a short while. Last run, this got to approx 1.3TB before the drive offlined itself and stopped.

Kiran Bandi's picture

Do you see any events logged in event viewer regarding tape drive??

jfkana's picture

Only what I can see in BUE anyway - the drive is offline check all the cables etc.

 

It just happened again - another 3 and a half hours gone! The job is still running[queued] so it didn't fail out right but the drive just went offline without warning after about 3 and a bit hours, after processing 1296GB/1600 (approx)

 

Really at wits end about this. It finished one tape of about 800 gig no problem, neither tapes have any hard or soft errors. Nothing strange or abnormal seemed to happen. It was a fresh install of BUE so there was no DB, no Cats, nothing. It was a fresh, bare metal restore scenario.

I'm not exactly asking for something off-menu, just to catalog and restore something-anything from the tape...

Larry Fine's picture

When a device is taken off-line, information is placed in the adamm.log file about what was happening with the device at that time.  Can you check the tail of your adamm.log file for any SCSI events.  Feel free to post your adamm.log file here and we will try and help decipher it.

If you find this is a solution for the thread, please mark it as such.

jfkana's picture

 

------------------------ -------------------------------------------------------------------------------
 
[3428] 02/13/13 10:24:32.828 DeviceIo: 04:00:17:00 - Device error 1117 on "\\.\Tape0", SCSI cmd b5, 4 total errors
[1508] 02/13/13 13:32:41.375 DeviceIo: 04:00:17:00 - Device error 1117 on "\\.\Tape0", SCSI cmd b5, 6 total errors
[1508] 02/13/13 13:32:46.532 PvlDrive::DisableAccess() - ReserveDevice failed, offline device
       Drive = 1002 "Tape drive 0001"
       ERROR = 0x0000001F (ERROR_GEN_FAILURE)
 
[1508] 02/13/13 13:32:46.597 PvlDrive::UpdateOnlineState()
       Drive = 1002 "Tape drive 0001"
       ERROR = The device is offline!
 
 
-----------------------------------------------
 
This is what appears is the adamm.log for the job, it then has a huge dump of the SCSI history.
 
http://pastebin.com/fyNP5M6j Is the link to the full dump from when the job fails.
AttachmentSize
adamm snip.txt 47.43 KB
Gurvinder Rait's picture

 

This has the error desc for  ERROR_GEN_FAILURE : http://www.symantec.com/business/support/index?page=content&id=TECH89750

Please try a slow catalog and see if you run into the same Issue. You can uncheck the use storage media based catalogs option from BE icon on top left hand corner -> Configuration and settings -> Backup Exec settings -> Catalog
jfkana's picture

Is there a specific option for 'slow catalog' ? We have unchecked storage media cats in BE, every new cat operation is also preceeded by deleting old cats so there's no conflicts.

A hardware failure would be unfortuante as this drive is new/unused before now, it has been on a shelf. It's had multiple cleaning cycles run through it and and it will cat/inventory some tapes.

 

It is most bizarre. I'm gonig to try a test with BUE2010R3 and 2012SP1a with the same drive and see if there's any other problems.

CraigV's picture

...there is not such thing as a "slow catalog"...not sure where the previous poster pulled that from!

Thanks

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

jfkana's picture

Hi Craig,

Looks like 'Fast' catting is using storage-media based cats and 'slow catting' is rebuilding the cats. I would assume in a real-world disaster, we wouldnt have the option of fast-catting anyway.

I've been deleting the cat folder every operation. I'm currently trying a new Full backup that only spans a single tape for simplicity.

The previous tape i was using was a monthly tape for jan so it was written in jan 2012 then jan 2013 there. It could be possible that there have been changes to the backup structure or something in those 12 months.

Currently it's catting and showing a byte count and speed. Previously on the iffy tape - it would occasionally make a lot of sound as it moved through the tape and scanned dirs/files but would not show any byte values so fingers crossed!

Larry Fine's picture

[1508] 02/13/13 13:32:41.375 DeviceIo: 04:00:17:00 - Device error 1117 on "\\.\Tape0", SCSI cmd b5, 6 total errors

SCSI: Raw CDB
SCSI: B5 20 00 10 00 00 00 00 00 34 00 00             . .......4..
SCSI:
SCSI: CDB Operation                           SECURITY_PROTOCOL_OUT

Your issue seems to be triggered by the B5 SCSI command, which is related to encryption.

Are you using encryption?  If so, is it hardware or software encryption?

Is your IBM LTO4 drive stand-alone or is it in a robotic library/changer?

Are you using LTO3 or LTO4 media?

You stated that you used to be able to catalog and/or restore previously.  do you know what might have changed since then?

If you find this is a solution for the thread, please mark it as such.

jfkana's picture

Hi Larry,

 

Very strange as we do not use encryption on the tapes to the best of my knowledge. It's also strange as it'll cat some tapes but not others! I have another full backup set to try on a fresh tape that isn't 100% full to see if it plays ball then,

It is a standalone IBM LTO4, rebranded as a Dell Powervault. We're using LTO4 media.

 

We can Cat/restore in our live environment which is a RoboLib but again uses an IBM lto4 tape drive at the end of the day. I would like to think if the building burned to the ground, we wouldn't need an exact match on hardware to beable to simply catalog a tape.

jfkana's picture

Ok new Developments here!

Managed to successfully catalog a tape containing a full backup, after that job completes, i tried a restore and it failed with the drive going offline, same errors as before.

 

It would appear the SAS card in use, a PERC 6/E is exclusivly RAID only, and does not support just normal SCSI mode, this seems to be tripping up the hardware/software even though there's nothing else attached.

 

Procured a plain-jane SAS HBA and now waiting on the new cable (long male sas on each end) and hopefully this'll sort things out!

Oddly, I got no warnings or compatibility flags pertaining to the controller on installation, or use etc. 

Gurvinder Rait's picture

good catch. Refer http://www.symantec.com/docs/TECH70907 

Support policy for Host-Bus-Adapters that feature RAID