Video Screencast Help

Tape Drive Going OffiIne

Created: 30 Oct 2013 | 36 comments

Hi Everybody

I have a HP Proliant ML350 G6 with 2003 R2 (64) and a HP DAT160 USB tape drive with HP drivers and Backup Exec 2010R3 with all the updates applied.

After restarting the Backup Exec services I can run one backup but that night the next back up will fail with the tape going offline.  I have checked the event logs and there is no entry for events 5/7/9/11/15.  I have also disabeled the Removable Storage service.

I have also run the HP tape drive check utility (HP L&TT) which updated the firm ware but does not indicate any issues with the hardware.

Any ideas please?

Thanks

Operating Systems:

Comments 36 CommentsJump to latest comment

CraigV's picture

Hi,

 

Do the following:

1. Delete the tape drive from Backup Exec Devices tab, and from Windows Device Manager.

2. Disconnect the drive, and restart the server. Make sure the tape drive is not listed in WDM.

3. Shut down the server, and reconnect the drive before turning it on. Start up the server.

4. Run tapeinst.exe to install the latest tape drives.

Also make sure that the RSM service is stopped and disabled, along with the HP Insight agents. The ProLiant ML350 G6 would have them installed with the SmartStart CD.

Thanks!

 

EDIT: You might also want to consider disabling tape drive polling as per the HP TN below:

http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?spf_p.tpst=kbDocDisplay&spf_p.prp_kbDocDisplay=wsrp-navigationalState%3DdocId%253Demr_na-c00718488-12%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

Hi

Do you know how to stop the HP Insite Agents?

The  HP article you sent is for LTO not DAT.

Thanks

CraigV's picture

Yes, and it is also for fibre channel. However, I've had to do something similar with a SCSI tape autoloader as part of troubleshooting.

Go into Services.msc, and anything starting with HP (like HP Storage Agents) can be stopped.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

Thanks, i will give your suggestions a go.

Weird thing is though, i seem to be able to manually run job after job no problem.  Its just the scheduled ones at nighty that send it off line.

CraigV's picture

OK, that's a bit more information to work with. This COULD be corruption of sorts with either the job, or in the BEDB itself. Try 1, or both, of these:

 

1. Recreate the job and selection list and schedule the job to run. See what happens and report back...

2. Open up BEutility.exe and repair the BEDB. Try the normal scheduled job again and report back.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Larry Fine's picture

check the tail end of the adamm.log for more info about why the drive went offline.  Post the adamm.log file here and we can help you interpret the file.

If you find this is a solution for the thread, please mark it as such.

revenge of the cream's picture

Hi

Adamm.log attached (server names changed to protect the innocent).

Thanks

AttachmentSize
adamm.doc 130.66 KB
sidd3009's picture

I did review the adamm.log file and found the following: 

[13532] 11/01/13 01:00:02.599 DeviceIo: 99:00:00:00 - Device error 55 on "\\?\usbstor#sequential&", SCSI cmd 16, 1 total errors

[13532] 11/01/13 01:00:07.599 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

       Drive = 1005 "HP 0001"

       ERROR = 0x000005B4 (ERROR_TIMEOUT)

 

[13532] 11/01/13 01:00:07.646 PvlDrive::UpdateOnlineState()

       Drive = 1005 "HP 0001"

       ERROR = The device is offline!

 

The ERROR_TIMEOUT part indicates that this operation returned because the time-out period expired(a strange scenario for a scheduled job..considering that after as service restart is done, manual jobs complete).

Please refer to the following article and look for the Event ID's mentioned in it(not only 5,7,9,11 and 15) http://www.symantec.com/business/support/index?page=content&id=TECH128041 

 

Regards,

Siddhant Saini
Advanced Technical Support Engineer, Symantec Corporation 
www.symantec.com

revenge of the cream's picture

Hi

I don't have any of those Event IDs in either System or Application logs.

Thanks

 

CraigV's picture

...I don't think that HP Library and Tape Tools will work with a USB drive, but you can try it and see if it gives you errors. You need to stop the BE services first.

Otherwise, try running a backup using NTbackup. Stop the BE services, and see if you get a similar error/s.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

HP LT&T worked as it updated the firware.

I can try NTBackup but i think it will work fine as Backup Exec works fine if i restart the services.

Will let you know after I have tested.

Thanks

 

revenge of the cream's picture

OK, three manual jobs were fine on NTBackup. 

I have set up three more jobs to run at 30 mins intervals this afternoon.

 

revenge of the cream's picture

Hi Everyone

I have been in touch with HP who have run full diagnostics and think  the drive is fine with  the latest drivers amd firmware. 

I have shown them the below log and want to know what an error 55 is and why the SCSI references when it is USB?

Thanks

 

[15416] 11/06/13 02:00:00.467 DeviceIo: 99:00:00:00 - Device error 55 on "\\?\usbstor#sequential&", SCSI cmd 16, 1 total errors

[15416] 11/06/13 02:00:05.467 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

       Drive = 1005 "HP 0001"

       ERROR = 0x000005B4 (ERROR_TIMEOUT)

 

[15416] 11/06/13 02:00:05.499 PvlDrive::UpdateOnlineState()

       Drive = 1005 "HP 0001"

       ERROR = The device is offline!

 

[15416] 11/06/13 02:00:05.499 Begin dump of device's SCSI history

 

[15416] 11/06/13 02:00:05.936 End dump of device SCSI history

 

 

revenge of the cream's picture

Hi

I have gone through lots of work with HP who say the drive is fine and updated the firmaware and have confirmed i have the latest drivers.

What is a device error55?

And why is the error indicating SCSI when it is USB?

Any more ideas?

Thanks

Larry Fine's picture

What is a device error55?

Error Code 55

System error code 55 means "The specified network resource or device is no longer available." This error code may also display as "ERROR_DEV_NOT_EXIST" or as the value 0x37.

And why is the error indicating SCSI when it is USB?

SCSI cmd 16

16h RESERVE UNIT Reserves the unit.

that is a very common command, so for that command to fail generally indicates a general communication failure.

http://www.symantec.com/docs/TECH49432

I suspect that you have some sort of a hardware issue or some sort of issue with the HP software installed on that server that is interferring with device communication.  There have been multiple weird issues seen over the years and many were solved by removing the HP software.  I have no idea why it isn't consistent.  YMMV.

If you find this is a solution for the thread, please mark it as such.

revenge of the cream's picture

Hi

I have tried a couple of test jobs with tracer.exe running and they have all gone through fine.

It always seems to be when it fires off in the nigth it fails.

Would it be possible that the drive is going into some sort of "sleep" or power save?

I have also stopped all the HP services.

HP have also come back saying the ML350G6, windows 2003 (64) and BackupExec 2010 are NOT on their compatibility matrix and I should upgrade to 2012.  I find this somewhat suprising.

Thanks

CraigV's picture

...weird, because Symantec don't have a server compatibility list...and I have used Backup Exec 2010 on an HP ProLiant DL385 G2, ProLiant DL165 G5, ProLiant DL385 G5, ProLiant 585 G7 before with no hassles.

HP Have the Enterprise Backup Solutions matrix, but I haven't seen servers on this. Take that information with a pinch-of-salt.

Symantec is concerned with a supported Operating System, tape library/drive/disk target, and sometimes an HBA...not the physical server it runs on.

 

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

Hi

Attached is what HP sent me.

Any ideas where i go next?

I am thinking of leaving tracer.exe running overnight to try to capture the failure.   I know the log file will get big but will it just be a pain to go through?  it wont eat up masses of space and make my server fall over will it?

Thanks

HP SPOCK.jpg
Larry Fine's picture

I would assume that is a clerical error or an oversight on HP's part.  It makes no sense for them to skip a version since they support the version before and after BE 2010.

Tracer is proibably your next step, but I fear it will just show you what the adamm.log shows.  The implications are that this is not a BE issue.

If you find this is a solution for the thread, please mark it as such.

revenge of the cream's picture

Is it possible a lack of available memory could cause this?

CraigV's picture

How much memory do you have installed, and how are the resources being used during the backup?

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

4GB memory installed

Task Manager>Performance shows:

Physical Memory (k):

Total: 4183654

Available: 743208

System Cache: 1079484

 

Page File set to 6127MB and 4.11GB of Page File in use according to Task Manager>Performance.

So about 750MB of memory available.

 

 

CraigV's picture

...that should be good-enough. Run a backup job and do a perfmon at the same time and see what happens when the remote agent service stops...

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

Nothing interesting in process manager.  I had to restart the Backup Exec service to get it to run.  It always runs fine if i restart the services and run it.  But if i leave it to run overnigth it fails.

I tried changing the start time from 1am to 4:30am in case something else was running but no difference.

Getting exactly the same issue on another server now.  Only 6 weeks old ML 350G8, Windows 2008R2, a LTO tapedrive and BackupExec 2012.  It started a couple of weeks ago but I thought i had fixed it by updating the firmware with HP.

It had worked for over a week but it has now gone again.

So different server Generation, different type of tape drive, different OS and different version of Backup Exec but EXACTLY the same failure.

Only similarity is they both run Oracle.  Can't believe this would make a diference though.

 

Larry Fine's picture

Getting exactly the same issue on another server now.  Only 6 weeks old ML 350G8, Windows 2008R2, a LTO tapedrive and BackupExec 2012.

What interface to the LTO tape drive?

What HBA?  Make sure it is a supported or non-raid HBA.  http://www.symantec.com/docs/TECH70907

If you find this is a solution for the thread, please mark it as such.

revenge of the cream's picture

Looks like they have a HP H222 Host Bus Adapter in Slot 1 with a HP ULTRIUM920 DRV plugged into it.

The RAID is running off a Smart Array P420i Controller in Slot 0

Thanks

 

CraigV's picture

...that card is a supported card for external tape drives...

http://h18004.www1.hp.com/products/quickspecs/14337_na/14337_na.pdf

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

revenge of the cream's picture

So any ideas where I go next?

HP are taking out a new USB drive for the first server today and fitting it even though it is not showing any errors..

CraigV's picture

Hi,

 

I'd suggest doing the following:

1. Upgrade the ProLiant Support Pack on that ML to the latest firmware...across the board.

2. Log a call with Symantec to check this out.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Netwest's picture

Any progress on this.

 

We have the same probem

 

DL380 G8 H222 BE 2010 2008R2

 

H222  Latest Firmware

          Latest Windows driver

 

HP Ultriul 3000 LTO5 Latest Firmware

Tried both HP and Symantec tape drive windows drivers.

[20180] 01/03/14 11:20:18.328 DeviceIo: 03:00:03:00 - Device error 1117 on "\\.\Tape0", SCSI cmd 34, 1 total errors

[20180] 01/03/14 11:20:23.368 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

       Drive = 1003 "HP 0001"

       ERROR = 0x0000001F (ERROR_GEN_FAILURE)

 

[20180] 01/03/14 11:20:23.407 PvlDrive::UpdateOnlineState()

       Drive = 1003 "HP 0001"

       ERROR = The device is offline!

 

Windows Event Log shows LSI-SAS2 ID 11 The driver detected a controller error

CDrom device also show the same error within seconds in Event log.

Looks like the tape drive can be access as a CDROM from within windows but none is visible.

 

This looks like a timing error on the SAS but no resolution.

 

Any ideas???

 

 

 

 

Moe Howard's picture

@Netwest: You are having a different problem based on this comment, "Windows Event Log shows LSI-SAS2 ID 11 The driver detected a controller error"  The problem should go away after resolving the problem with the SAS2 HBA.

Netwest's picture

After reviewing the adamm.log for some time I realized the tape device was randomly being detected as tape0 or tape1. Whenever tape1 was detected BE would complain and take the device Offline.

I'm not sure how often BE runs the Device Discovery process but the device would randomly go Offline.

I have re-run the device configuration  wizard and selected the option to remove unused devices. It was previously showing 2 other unused device during this prcess and now only shows the current device.

This machine has had a number of tape drive changes due to problems with a new LTO tape causing the drive loading leader to dislodge, require a drive replacement. It took 3 replacement drives before we discovered the problem. As at least one of these previous drives were on a different SCSI id, there were registry entries for unused devices that may have been detected by BE.

 

Now waitng to see if the problem is resolved..

 

CraigV's picture

...running that particular option should be done whenever you replace a drive.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Netwest's picture

Not clearly documented, particularly deleting old drivers, and clearly causing problems with BE device discovery process. This can be seen in Revenge of the Cream's adamm.doc where tape0 is sometimes detected as tape1 and is failing Offline. I have seen quite a number of "device offline" posts in the forums without resolution. These likely to be a victim of this problem. 

Moe Howard's picture

It's concerning that the tape drive appears as Tape0 or Tape1 at times, in the OS. If there is only one tape drive attached to the server then it should enumerated as Tape0.

When the tape drive is reporting as Tape1, look at the following key to see what may be listed as Tape1:

\\HKEY_LOCAL_MACHINE\HARDWARE\DEVICEMAP\Scsi

Click on the Scsi hive then press the asterisk key to expand all the Scsi ports and start looking for anythign at Tape0. Maybe this will help expose the root cause of the problem.