Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Backup Exec 2012 Offlining HP Tape Drive Randomly

Created: 03 Aug 2012 • Updated: 11 Apr 2013 | 31 comments
This issue has been solved. See solution.

Hi All. First post here and I am really, really hoping for some help.

I am working on an environment that appears to have the fairly common problem of Backup Exec offlining an HP Autoloader at seemingly random times. Now, three weeks ago another tech fixed this issue by disabling 4 recommended HP services and it worked fine. The other night, the issue re-appeared despite nothing having changed - wierd.

Here is the setup:

Server 2K8R2 BE 2012 SP1
HP Ultrium1760 DRV tape device
HP 1/8 G2 Autoloader
HP P212 SaS controller

Here is the (compehensive) list of things I / we have tried to no avail:

1 - Replaced all hardware at a very early stage of the problem (has been ongoing for months now)
2 - Ensured Backup Exec's own drivers are being used for the device
3 - Set DB Maintenance to run way outside of the backup routine
4 - Un-installed and re-installed tape drive + autoloader
5 - Disable HP services as recommended by Symantec
6 - Attempt to backup to disk - this works
7 - Opened 3 seperate cases with Symantec, none of which fully resolved the issue as it re-occured
8 - Checked the ADAMM.log file and found this error just before the device offlined: 

[4608] 08/02/12 01:15:04.389 DeviceIo: 04:07:00:00 - Device error 1167 on "\\.\Tape0", SCSI cmd 4d, 1 total errors
[4608] 08/02/12 01:15:13.063 PvlDrive::DisableAccess() - ReserveDevice failed, offline device
Drive = 1033 "Tape drive 0001"
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

9 - Tried running SGMON with Verbose devices and media logging enabled - this didnt go well. The device didnt error but none of the backup jobs completed.

The only thing I think we havent tried is running tracer.exe after seeing the device go offline. 

I read in Symantecs documentation that they don't support SaS controllers with RAID enabled, the P212 is one of those controllers. However, in the same document there was a list of tested controllers and the P212 is on that list, so I'd be a little cheesed if Symantec said it was a compatibility problem despite having a document that says it should work.

Ive also checked all the usuale places (event log, BE job / device logs etc) for more info and there really isnt much to go on. HP's testing tools always come back with passes when run, Im 99% sure that this isnt a hardware issue but have nowhere left to go with it.

Could someone please help me out? 

P.S. just as an FYI - this Backup Exec instance came from an upgrade of 2010, it wasnt a brand new install.. Don't know if that matters.

Thanks in advance,

Matt

Comments 31 CommentsJump to latest comment

Backup_Exec's picture

Hi
Please ensure backup exec 2012 is fully patched up with sp1a and latest ddi. If you have not installed latest ddi yet please install it from below link
http://www.symantec.com/docs/TECH189571
Also once you do that unistall and reinstall tape drive using tapeinst and then do power cycle by powering off library and then media server and then power on library wait for it to intialize and then power on media server
http://www.symantec.com/docs/TECH17931

Thanks

Sameer

Don't forget to give a "Thumbs Up" or Mark as "Solution" if someones advice has helped you.

Larry Fine's picture

Backup Exec tried to reserve the device and it was rejected.  Therefore BE took the device offline.

[4608] 08/02/12 01:15:04.389 DeviceIo: 04:07:00:00 - Device error 1167 on "\\.\Tape0", SCSI cmd 4d, 1 total errors
[4608] 08/02/12 01:15:13.063 PvlDrive::DisableAccess() - ReserveDevice failed, offline device

This issue is much more likely in a shared SAN, with multiple servers trying to access & share devices.  On a single server SAS environment, this should not happen.  I know your P212 HBA is on that supported list, but that is where I would focus, as something is interfering with communication.  Is your HBA firmware and driver up to date?  I have heard of issues with HP software & services also.

Might you have another HBA to try?

If you find this is a solution for the thread, please mark it as such.

CraigV's picture

...Larry is thinking of the HP Storage Agents. If you have a ProLiant server that was installed via SmartStart, stop and disable this service.

Might also be worth your while to get hold of HP's Library and Tape Tools. Stop the BE services, and run the diagnostics against the drive to rule out hardware errors on it.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Matt_Freestyle's picture

Thanks for the replies guys, sorry for the delay in coming back.

Backup Exec is fully patched yes.

I was headed toward the HBA as well. I have just checked the server over again, there is a second P212 controller controller a seperate set of disks in a RAID - could this have some sort of impact do you think?

I have tried the tape tools Craig, they all came back fine. The drive was replaced ages ago, back in Jan because Symantec put the issue down to a hardware fault. HP didnt quibble luckily and just replaced it, but the issue has since reared its ugly head again.

Im unsure of server vendor or if it was installed via SmartStart if it was HP - will check that this afternoon and report back. Thanks for the replies so far please keep them coming! Any ideas appreciated!!!

CraigV's picture

No, it shouldn't have any connection to your issue if HDDs are connected to another RAID controller.

You don't perhaps have access to a dedicated SAS HBA for the drive? If so, you can always connect the drive to this and check again.

 

PS: Would the "clever" person who -1'd me please take the time to PM me and explain why...indecision

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Matt_Freestyle's picture

I must admit, I couldn't see any reason for the -1 either, Craig!

Unfortunately Craig a dedicated HBA isn't an option - We don't have one in the lab anywhere here and neither does the site I am working on. Also, I would have to down the server in production hours and this too, isnt an option. (tricky, I know!)

I'll double check the firmware this afternoon as well whilst im at it, annoyingly the tape drive didn't offline Thurs or at all over the weekend when full backups were running, the only things that I changed were the DB maint times and the servers NIC power management settings.. Can't see how the latter wuold change anything at all if the device isnt reserving correctly, but ho hum.

Thanks for the continued assistance.

Matt_Freestyle's picture

Chaps - we have a breakthrough! I checked the firmware version of the card and sure enough, its out of date, massively out of date! Then I found this document from HP... http://tinyurl.com/ceft8fk

It basicly explains how any firmware before 3.66 can have sporadic connectivity issues with tape drives.. RESULT! Well.. Sort of.. I need to get the firmware installed on site and then I will update this post with my findings. Fingers crossed this sorts it.

Symantec support - if you read this.. Please please please KB it and use it for future reference, I bet a lot of people that are having these sorts of connectivity issues are suffering exactly the same problem!

CraigV's picture

Good stuff...with regards to Symantec creating a KB of this, why not head to the Ideas section and add it in as an idea?

https://www-secure.symantec.com/connect/backup-and-recovery/ideas

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Larry Fine's picture

You are welcome.  (I suggested you update your HBA firmware and driver).

If you find this is a solution for the thread, please mark it as such.

Matt_Freestyle's picture

Sorry Larry, I should have thanked you in my previous post. I wasnt thinking that the thread would be marked as solved as it hasnt been tested yet.. But I will post back if I have any issues.

Larry Fine's picture

Sorry, I thought you marked it solved.  I was under the impression that only the OP could mark cases as solved?

If you find this is a solution for the thread, please mark it as such.

CraigV's picture

Nope, TAs, OPs and Admins can, and I did (beats a -1 hey?) as that is kind of what the OP was saying (ie. he found the solution!)...I've made the change to reflect the correct post.

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Matt_Freestyle's picture

I think the Symantec Admin must have marked it as solved :(

Matt_Freestyle's picture

Hi Guys, I'm back!

Okay, so the issue isnt resolved. The customer is still getting fairly consistent problems with their backup, however the drive is not continually offlining anymore after the firmware upgrade to the p212. Things are a little more steady, but still not great.

I cannot find any commonality in the problem, the drive appears to fail either at the beginning of a job of halfway through. When it fails halfway through the error BE throws says the device isnt connected, ADAM extract below:

 

[7144] 09/03/12 16:13:00.731 PvlDrive::OpenHandle() from device number
Drive = 1033 "Tape drive 0001"
DeviceName = \\.\Tape0
PrimaryName = \\.\Tape0
SecondaryName = \\?\scsi#sequential&ven_hp&prod_ultrium_4-scsi#5&1a9eeae2&0&070000#{53f5630b-b6bf-11d0-94f2-00a0c91efb8b}
ERROR = 0x00000006 (ERROR_INVALID_HANDLE)

 

When the failure occurs at the beginning of a job the failure in the log looks like this:

 

[22812] 08/31/12 13:00:34.438 B: Not caching B2D entity in Storage Manager: 'Test_B2d', key ID 1023, features 0x00000000, disk flags 0x0000001800000001
[24084] 08/31/12 18:00:16.633 DeviceIo: 04:07:00:00 - Device error 6 on "\\.\Tape0", SCSI cmd 16, 45 total errors
[24084] 08/31/12 18:00:21.663 PvlDrive::DisableAccess() - ReserveDevice failed, offline device
Drive = 1033 "Tape drive 0001"
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[24084] 08/31/12 18:00:21.692 PvlDrive::UpdateOnlineState()
Drive = 1033 "Tape drive 0001"
ERROR = The device is offline!

If the drive does go offline it now takes a full restart of the BE services to get it up and running again.

I have submitted a support case to HP in case of hardware fault, but HP Library and Tape Tools indicate the autoloader is healthy. 

I logged onto the autoloader interface and noticed it hasnt been powered down in a good while, and also the system time was half an hour out of sync. Seems trivial, but could this be the cause of my woes? 

Thanks in advance,

Matt

CraigV's picture

...is the firmware on your backup device current?

Also, are your cables all undamaged?

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Bruce_Thomson's picture

Hi Guys,

This is my first post here. We are suffering the exact problem mentioned in this thread. I am about to log another call with Symantec Support (has been logged previously by my colleague).

The hardware is as follows:

HP X1600G2 24TB StorageWorks Server

HP MSL4048 with 2 x HP LTO5 SAS drives (all on current firmware)

Dedicated P414 512MB SAS Controller (current firmware)

HP Insight agent services completely disabled

Whave installed all available hotfixes, and SP1a for BES2012. The StorageWorks server is running Windows Storage Server 2008 R2 Std.

Larry Fine's picture

re: Dedicated P414 512MB SAS Controller (current firmware)

Is the P414 a RAID controller?

I couldn't find it in Google.  If it is a RAID controller, it is not listed on http://www.symantec.com/docs/TECH70907, so it wouldn't be supported.

If reservation errors are the issue, I would suspect a hardware or configuration root cause.  BE cannot do anything about a reservation failure.

If you find this is a solution for the thread, please mark it as such.

Bruce_Thomson's picture

Hi Larry

Sorry that was a typo... It's a P411 which is supported.

Matt_Freestyle's picture

Hi Guys

HP have replaced the tape drive within the autoloader along with the cables. Lets see how it goes.. See you all again in a few weeks I suspect!

 

Cheers,

 

Matt

Matt_Freestyle's picture

Hi All

 

Okay - so the replacemet drive didnt work! HP are continuing to look at the case however and have asked me to make the following changes:

"Random backup failures when HP StorageWorks Ultrium Tape Drive is connected to a LSI based Host Bus Adapter and Storport driver version installed on the system is later than 5.2.3790.3959, due to Insight Manager Storage Agent timeout if the driver returns SCSI status BUSY and Storport driver retries the command unlimited times. In most of the cases, the tape drive will be discovered properly by the Operative System and will work fine when tested with HP Library And Tape Tools. Even if all possible polling to the tape drive is already prevented, the drive will fail backups randomly. System Event Log will not show any data that can be related to a drive or HBA failure (Event IDs 7, 9, 11 or 15).

Issue could appear on both Microsoft Windows Server 2003/2008, both 32bit and 64bit editions and with HP Insight Management Agents installed.

 

Solution

1. Click on Run, type regedit.

2. Open the path HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\[device identifier name for the tape LTO-3 tape device]\[numeric device instance id for the LTO-3 tape device]\Device Parameters\

3. Right Click Device Parameters, click New> Key and rename it as Storport.

4. Right Click Storport key> New> DWORD and rename it as BusyRetryCount and the value should be set to 250 decimal.

5. Exit regedit.

6. Reboot the server.

 

This registry settings are documented on Microsoft public article http://support.microsoft.com/kb/932755 under "Update to modify the behavior of the BUSY status and the Task Set Full status in the Storport driver", in our case, the default value of 20 has been changed to 250."

 

I will let everyone know if this fixes the issue or not. 

AmadeoD's picture

Has this ticket been finally resolved with this Win/Reg Storport hacks?

Matt_Freestyle's picture

Hi all,

Nope - still not fixed. HP havent got back to me for around 4 days now. I think they are losing interest in the case, despite it clearly being some kind of hardware fault.

My money is now on the P212 SaS card being the cause of the issue, I am going to push for a replacement card if possible and then go from there I think

As always I will keep this post updated because I am determined to get this effin problem fixed!

Matt

Daniel_Pazout's picture

Hi all,

I facing the same problem,

The hardware is

HP X1600G2 24TB StorageWorks Server

HP MSL4048 with 1 x HP LTO5 SAS drives (all on current firmware)

Dedicated P212 SAS Controller (current firmware)

I summoned HP support on this and sent them my logs....

They responded: The conntroller is not showing HW issues, but for backup is not a good choice. Should be a controller without RAID function. We will search for an efective solution...

Well p212 is recomeded for this tape library but not for backup.... interesting      

Matt_Freestyle's picture

Hi Daniel,

Interesting! HP are suggesting to me that it could be an issue with the motherboard on the Storageworks x1600 and that they may have to escelate to the Proliant team to resolve.

Unfortunately, my client hasn't got back to me on if they wish for this to happen or not, and typically the backup has been working okay for the last few days.

Please do let us know how your issue progresses.

Thanks

Matt

Katsuki Okamura's picture

Recently I have resolved an issue which is almost same configuration and the same error. Then, I updated TECH61192.

"An unknown error occurred" may occur when backing up to tape on a server with the HP Server Management Agents Software installed.
http://www.symantec.com/business/support/index?&pa...

The issue was not resolved with the step 1 to 6 and step 8 in TECH61192.
Finally HP Server Support team suggested to uninstall "HP StorageWorks VDS Hardware Providers for MSA Disk Arrays", and then the issue was resolved. I added this step as step 7 into TECH61192.
Sorry but I don't know how to uninstall VDS Hardware Providers. Please contact to HP server support to uninstall the VDS Hardware Provider.

Please NOTE that Symantec does not support connecting a tape library to a RAID controller without hardware vender recommendation. If HP does not recommend the RAID use now, you may purchase a standalone SAS controller.
 

Daniel_Pazout's picture

Hi Matt,

Yesterday I did the following steps:



1) I uninstalled "HP StorageWorks VDS Hardware Providers for MSA Disk Arrays" as suggested by Katsuki and waited for the end of bussines to restart ...



2) Then later I was contacted by HP with a replacement controller, just to be shure that the controller is not defective. I picked up and I installed it to the server



3) firmware was pretty outdated so I updated with:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=US&swItem=MTX-9f5575741ed04b15885896772d&mode=5&prodTypeId=329290&prodSeriesId=3885791

Now I will wait 14 days and if the error does not reappear, I'll try to install HP StorageWorks VDS Hardware Providers for MSA Disk Arrays back to see if it was the controller....

 

 

Daniel_Pazout's picture

Hi all,

from my tests it is truly "HP StorageWorks VDS Hardware Providers for MSA Disk Arrays", 3 days after instalation the error was back...

Daniel

Wes Miller's picture

Run “”tapeinst.exe” in the root of your BackupExec directory.

Check “Use Symantec tape drivers for all support devices”
Check “Delete entries for tape devices that are unavailable, removed, or turned off”
Check “Use Plug and Play Drivers”

FYI -> USB DEVICES ARE NOT SUPPORTTED

Click and next your way through the rest then finish. This worked for me.

This is for the known issue that removing a tape drive physically or changing it and the tape drive still appears in BE. This also covers the cannot delete because the drive is in use as well.

It is not necessary to edit the DB unless this doesn’t work. NOT LIKELY.

tapeinst.jpg
CraigV's picture

Hi Wes,

 

This is a very old post...no need to drag it back up again. The OP never bothered to respond at all.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

Matt_Freestyle's picture

Hi All,

Sorry to drag this up from the grave. Also, sorry for not responding. The notification of replies started going into my junk for some reason!

Okay so I never did get the issue resolved on the existing hardware, in the end we ended up with HP diagnosing a "low level hardware issue". We shipped the customer an old server, whacked a new P212 RAID controller into it and connected the library up to it. So far, so good. It's been working for over a month now.

Seems the most likely cause of these SCSI reservation errors is as HP say, a low level hardware issue.

Thanks,

Matt.

SOLUTION