Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

BackupExec 2010 dropping tape drive offline

Created: 10 Aug 2010 • Updated: 24 Oct 2010 | 39 comments
This issue has been solved. See solution.

Hi,
This is a new topic to re-open the issue discussed here:

https://www-secure.symantec.com/connect/forums/tap...

Just to recap on the issue itself:

Running BE2010 Trial on Windows Server 2008 R2 x64.  Hardware is Dell R710 rack server, connected to a HP Ultrium 1760 LTO4 SAS drive via a Dell PERC 6/E SAS Adaptor card.

This is what’s happening:

  1. Setup and run a backup job with full verify.  I’ve tested up to a 250GB job and it completes fine the first time around.
  2. After completion of first job, rerun the same backup job (after tweaking the media overwrite/append etc settings)
  3. The job immediately fails, reporting the drive as offline – cannot bring drive back online using BackupExec administrator
  4. Close BackupExec administrator, restart BackupExec Device/Media service (which also restarts associated service dependencies).
  5. Re-open BackupExec administrator - drive appears to be online again, ready for another job to be kicked off.

 
I’ve completely replaced:

  • Numerous LTO tapes
  • The Dell SAS adaptor that connects the Dell server to the HP rack chassis
  • The HP Rack chassis that houses the 1U internal HP LTO4 Ultrium 1760 SAS drive, SAS adaptor board and all
  • The internal HP LTO Ultrium 1760 drive itself

 
We’ve verified that all above component drivers and firmware are fully up to date.  This problem occurs with both Symantec Tape Drive drivers and HP's official LTO4 tape drivers,
 
To me, the following facts suggest that the problem lies with BackupExec itself:

  1. After the drive goes offline, a simple restart of the BackupExec services seems to bring it back online again
  2. If the drive goes offline, and I stop the BackupExec Device/Media service, I can still access the drive using HP LTT utility, and HP Data Protector Express – only BackupExec detects the drive as offline.
  3. We’ve replaced all possible hardware components and the problem still remains
  4. This problem only ever occurs with BE, I'm testing with HP Data Proctector Express and MS's Data Protection Manager - all work fine.

Luckily, I haven't bought a license for BackupExec yet, however this means I doubt Symantec Support will be willing to spend too much time on it, so I might be forced to go with MS's Data Protection Manager instead if I can't get a resolution soon.

Has anyone seen/resolved this issue before?

Alistair

Comments 39 CommentsJump to latest comment

CraigV's picture

Mmm...OK, so you say the drive goes offline after making certain tweaks? What settings work, and what settings make it go offline?
Also, are the DP services stopped when you're doing your backups?

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

The only tweaks I'm making are changing the Device/Media tab from Overwrite mode to Append to tape or overwrite if no appendable media is present.  Even if I keep it on overwrite both times and kick off an the job in its identical state, the second job always fails.  Yep, all other non-Symantec backup servers are stopped, this problem was occurring well before HP DPE and Microsoft DPM were installed.

Thanks,
Alistair

Larry Fine's picture

WHY is BE taking the drive offline?  Do you get any errors or alerts?  Anything int he Windows event logs?  anything in the adamm.log file (in the BE folder)?

If you find this is a solution for the thread, please mark it as such.

CraigV's picture

Mmm...OK, then look at getting hold of firmware updates through LTT and verifying that it sees no errors on the tape drive.

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

AndyBNZ's picture

I'm having this exact problem.  Was fine for two weeks, now after every succesfull backup the tape drive goes offline.  It's not a new backup job that kicks it as if you run an inventory it's not available.  Simply putting it back online is enough to get it working.  Running an inventory does not make it go offline, only a backup.

LTO2 connected via iSCSI using starwind.  iSCSI connection is all good, server never ever loses tape drive from hardware, it's just BE 2010 that loses it.  The only error is that the tape drive is unavailable and the job fails.

Alistairc's picture

Hi Craig,
I've ran LTT - I'm running latest firmware and no problems are reported on any tests I run.  The tape drive runs fine on all other software suites, the problem only appears with BackupExec.

Andy - yep that sounds identical to my issue alright, are you running Windows Server 2008 R2?

Larry - good question indeed! No idea why BackupExec is detecting it as offline.  There are never any errors in the Windows event logs, I'm running another testrun now so will check out the adamm.log file afterwards and update the thread.

Alistair

AndyBNZ's picture

SBS2008 R2.

OK, update - I applied all the hotfixes last night.  Backup ran without issue and tape ejected.

I can then run an inventory which is succesful - it reports no media.

As soon as I put a new tape in it becomes unusable.

By this I mean if you go to 'devices' tape drive appears online, check hardware manager and it's all good.  Right click and it is 'enabled and online'. Run an inventory and it fails with 'drive hardware is offline' 

I then right click it, put it back online and inventory and hardware will run.

SO it appears to be a problem with it reading the new tape perhaps...

CraigV's picture

You guys are using HP tapes correct? The reason why I ask is that we shipped IBM tapes to 1 of our sites in Asia for use in an HP StorageWorks 1/8 G1 autoloader. Each time we tried to use those tapes, it took the drive offline. Put an HP tape in, and it worked well!
Also make sure you don't have bar code rules enabled...if you don't have bar code labels, it can cause issues.

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

Craig, yep I'm using official HP LTO4 media, and don't have any bar code rules enabled anywhere.  I've upgraded to 2010 R2 in the hope that magically resolves the issue - no luck I'm afraid.

Andy - I don't think it's solely a new tape thing, try running a second backup job using the same tape without ejecting after the initial run - the drive will go probably go offline?

I've ran some further testing - I can run as many drive inventories (immediately after one another) as I like without failure.  However, after any backup, restore, or even catalog job has completed, the next job which utilises the drive in any way (be it inventory, restore, or backup, etc) will attempt to start, but immediately detect the drive as offline and fail.

Andy, can you confirm you can definitely get BE to operate again after forcing the drive online in BackupExec Admin Console?  Even if I force the drive online here after failure, it doesn't actually become usuable again until I restart the BackupExec services.

The fact this is only occurs after job types that require 'long' media transport operations (rewind/fastforward/long read/long write etc), might suggest that something is detecting some sort of timeout on the SAS interface...?

Larry Fine's picture

Please check the adamm.log file, it should give valuable clues as to WHY the drive is going offline.

If you find this is a solution for the thread, please mark it as such.

AndyBNZ's picture

I can confirm that for the last four backups just putting the drive online has worked.

I need to go onsite to check if they are HP tapes..

CraigV's picture

OK...next step would be to try backing up using NTbackup to see if that puts the drive offline. If it does, it is a hardware issue.
What does running a cleaning tape do...any help?

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

Unfortunately Microsoft, in their infinite wisdom, removed native tape drive support as well as NTBackup from Windows Server 2008, so this is no longer an option.  I've installed HP's Data Protector Express and Microsoft Data Protection Manager for troubleshooting purposes - neither of these applications suffer from the tape dropout problem, all jobs complete successfully.

I've cleared the adamm.log and am running another test run now, I'll review and post the output.

Thanks,
Alistair

AndyBNZ's picture

sorry, how can I clear the adamm.log file (tried renaming it but it was in use)?

Also restarting the device and media service once a new tape has been put in means the next job runs without issue.

So my current workaround is to schedule a restart of that service on a nightly basis - not ideal but means I don't have to dial in every night to put the bloomin' tape drive back online.

CraigV's picture

Andy: I have hit the support flag on this...give it a day or 2, and post whether or not someone helped you.

The ONLY other thing I can suggest is looking into an upgrade to BE 2010 R2 to see if this resolves the issue...

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

Andy,
If you stop the BackupExec Windows services you should be able to clear out the file, restart and kick off a tape job and the adamm.log should start being written to from fresh.

I've discovered the following alert in my logs:

[5920] 08/17/10 12:33:51.946 DeviceIo: 02:00:01:00 - Device error 1117 on "\\.\Tape0", SCSI cmd b5, 5 total errors
[5920] 08/17/10 12:33:56.980 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

       Drive = 1004 "HP 2"
       ERROR = 0x0000001F (ERROR_GEN_FAILURE)

Craig, could you please get Support to get in touch with me about this also, as the original author of this topic?

Thanks,
Alistair

Larry Fine's picture

re: SCSI cmd b5

The B5 is an encryption command.  Are you attempting to use hardware encryption?  BE does some of these just to probe the drive capabilities, but it shouldn't cause a crash.

If you find this is a solution for the thread, please mark it as such.

Larry Fine's picture

re: Running BE2010 Trial on Windows Server 2008 R2 x64.  Hardware is Dell R710 rack server, connected to a HP Ultrium 1760 LTO4 SAS drive via a Dell PERC 6/E SAS Adaptor card.

I believe the problem is the RAID HBA card and is unsupported for tape.  Sorry I didn't spot that sooner.
http://seer.entsupport.symantec.com/docs/325573.htm
http://support.dell.com/support/edocs/stor-sys/mat...

Do you have another SAS HBA to test with?

If you find this is a solution for the thread, please mark it as such.

SOLUTION
AndyBNZ's picture

OK cleared out the adamm log and ready for tonights backup.

Setup is:

Core Windows 2008 running Hyper-V and SBS 2008 as Virtual Server

Hosted on HP ML350 G6

Connecting using MS iSCSI initiator to:

ML150 G5 with LSI Logic PCI-X Ultra 320 SCSI controller with HP Ultrium LTO2 added as a target on Starwind running with MS iSCSI initiator on Windows 2003.

Have to have it this way as MS don't believe in tapes so theres no passthrough on Hyper-V (thanks) and this is their approved workaround.

I've had no issues on BE 2010 with it until recently.

pkh's picture

@andy - perhaps you should start another discussion for your problem.  It will be less confusing as to who is answering who.

Matthew Green's picture

Hi there.
We also have a client with the same problem. First backup works correctly then tape unit goes off line.  System spec is as follows

Dell R710 - Windows 2008 R2 Standard x64
PERC 6/E
IBM Ultrium-HH4 (Dell Powervault LTO4-120HH) (Firmware 97F1)
BE 2010

All software and hardware is patched to the latest level.  I have only taken over this issue from a collegue recently so have no done all the test but noticed this thread and thought it worth adding to it.

I am seeing an error in the adamm.log

08/17/10 19:32:11.979 DeviceIo: 03:00:00:00 - Device error 1117 on "\\.\Tape0", SCSI cmd a2, 1 total errors

which according to Wikipedia is SECURITY PROTOCOL IN
and also

08/13/10 23:00:15.594 DeviceIo: 03:00:00:00 - Device error 1117 on "\\.\Tape0", SCSI cmd b5, 8 total errors
which according to Wikipedia is SECURITY PROTOCOL OUT
In BE it reports the tape unit doesn't support encryption

Will keep investigating here.

CraigV's picture

Mmm...a RAID card would definitely not be supported, so Larry is spot on there. That would actually be found in the HP documentation around backups, and also in the QuickSpecs of the drive which would indicate which cards to use. That would be for an HP server.
Interesting that Matthew also has a problem with that model of Dell server!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

Interesting indeed!  Unfortunately I don't have another spare HBA to test with - can anyone recommend a cheap one?  I can't spent another £200 only for it not to work either.

I'm confused about a RAID card "definitely" not being supported though - it's acting as a SAS HBA and is dedicated to the tape drive alone - no disk volumes are connected at all and RAID is therefore not enabled; also I reiterate again that all other backup software suites are operating the drive fine, so what's it to Symantec that the card also happens to be a RAID controller, especially when RAID is not enabled?  We've used similiar tape drive setups with external SCSI RAID/HBA adaptors in the past and they've always been rock solid.

CraigV's picture

Might be time to open a support call with Symantec...if you do, and they help solve it, please post the solution and close off the topic.

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Matthew Green's picture

The tape drive and server were bought as a bundel from Dell so I "assumed" that everything would work.  I have a call in with Dell to confirm this. My gut does say that it has something to do with the controller but the fact that a restart of BE services resolves the problem and not a restart of the server implies that it is some issue that BE has with this controller and not the controller and the tape unit. I am now going to make some (carefull) changes to the PERC controller to see if this resolves anything.

Matthew Green's picture

I have spoken to Dell and their response is as follows: Swap the Perc 6/E controller for a Perc 5/E controller and this should resolve the issue.  This will take a few days to organise so will update here if it resolves the issue.

Matthew Green's picture

We fitted the Perc 5/E and we have have resolved the issue.

AndyBNZ's picture

Reinstalling starwind (or the reboots) seems to have fixed this temporarily.  If it happens again I'll post a new thread.

cheers

Colin Weaver's picture

General Advice is

1) Don't use a RAID card
2) If possible don't use an onboard chipset for the SCSI/SAS controller (we have seen some onboard LSI chipsets give problems - although not yet, to my knowledge, in Dell servers) - so try a stand alone, possbly Adpatec card instead

As lots of valid advice and troubleshooting has been provided in this thread already - would suggest that if you continue to see problems then log a formal support call so that the logs and environment can be fully analyzed.

Larry Fine's picture

Why don't I get the solution mark?  I said it was unsupported almost a week ago.

If you find this is a solution for the thread, please mark it as such.

CraigV's picture

Hi Alistair,

Larry did give the answer a couple of days ago...if the issue was with using a RAID card, he would get it.
Can you please reassign if so?

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Alistairc's picture

Hi,
I didn't intentionally mark any of the responses as solutions, so I've cleared the flag.  I'm awaiting a SAS 6/E delivery from Dell, once this is in and tested to prove the problem is resolved (this should be in the next couple days) I'll mark the solution as appropriate.

Thanks,
Alistair

Larry Fine's picture

Does the supported HBA resolve this thread?

If you find this is a solution for the thread, please mark it as such.

Alistairc's picture

Larry,

Apologies for the delay in my reply, I have been fully allocated to project work, and have literally just found time to install a Dell 6gbps SAS HBA board into the R710.

Ran a few tests, and pleased to say everything is now working as expected.  As an added bonus, I've seen the job rates rise up to 8,400MB/min (according to BE anyway).

Thanks for your assistance.

Regards,

Alistair

pkh's picture

Alistair,

You should mark one of Larry's reply as the solution and close out this discussion.

Alistairc's picture

phk,

As far as I can tell, Larry's comment has already been marked as a solution.

pkh's picture

I think I posted my comment just as you were marking it.  Never mind.

BrentN's picture

I just experienced the same issue, drive was offline, had to restart device and media service to get it back online. Here's my config:

Dell R710 Server, latest firmware, drivers
HP 1760 SAS Tape Drive (external)
HP SC44Ge HBA (part of the smart array family but not a RAID controller - this is the card bundled with the smart buy for this drive)
HP Tapes
BE 2010 with all applicable updates
No AV client installed
Windows Server 2003 R2 SP2 x86, all current updates installed

** edit **
I also wanted to note that many of the HP Smart Array RAID controllers support the connection of a single tape drive, so you may want to check the facts on your Dell card. Granted, it's not exactly the best way to do it, but in some cases (at least with HP) it is supported.

pkh's picture

@BrentN - You should start a new discussion for your problem so that it can get the attention that it deserves.  You may refer to this discussion if you want to.