Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

NB 7.1 drive drops when drive cleaning is performed

Created: 17 Nov 2013 • Updated: 17 Nov 2013 | 15 comments

Hi All,

I've been having this issue with drives being dropped from the O/S. I would normally have to disable the port on our Qlogic card in device manager and then up it. The drive would work fine again for a while. After months of going back and forth with Dell I found the issue is related to a cleaning request from the libarty unit. About 6 minutes after it is made, the drive drops. When I check the library the drive is empty and it appears that the cleaning was done (I would have an alert on the drive saying it needs cleaning). Has anyone seen this before? Is it an issue with the Qlogic card or library? I recently replaced the cables (direct attached) and it didn't help.

Thanks for your time.

Operating Systems:

Comments 15 CommentsJump to latest comment

mph999's picture

Never seen this.

I can't see why the QLogic card would care what type of tape is in the drive, so I would suspect the drive / drive firmware / driver first.

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Marianne's picture

....  the drive drops ...

Please explain what this means?

Do you have drive cleaning enabled on the library (cleaning media in special library slots) or in NBU (cleaning media in normal slots and config'ed in NBU as Cleaning tapes)?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

npolite's picture

All of the drives/updates have been applied to the Dell PowerVault TL4000.

To Marianne's question, the drive drops out of the O/S in device manager. It is not seen until I either reboot or reset the qlogic port that tape drive is running on. Cleaning is setup to automatically happen when the library thinks it needs to be done and not Netbackup.

We have two drives in the library and both are doing this.

mph999's picture

Hmm, if both are doing it ... I wonder if in this instance it is more likely to be the card, even though this wasn't my first 'suspect'.
Are you able to swap the card out ?

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
npolite's picture

Yep. I have a new (used) card being delivered this week. I can't believe how expensive a new card ($1200) is over a used one $40.

I also ran the loopback test on the Qlogic card and it passed. Also the only thing Sansurfer (Qlogic uitlity) shows is the drive being "disconnected". I hope the card is it because Dell has given up and is blaming anything else except the library.

Marianne's picture

Is anything logged in Event Viewer System log when the drives disappear?

I have seen something similar a couple of years ago with an old Overland library that was due for replacement. Library power cycle brought the devices back.
Because new library was already ordered, nobody bothered to troubleshoot or try and fix it.
Problem disappeared when new library was installed - same server, same hba's.

Maybe move drive cleaning to NBU in the meantime to see if that makes a difference?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

npolite's picture

Event viewer does show and error:

The device 'IBM ULT3580-HH5 SCSI Sequential Device' (SCSI\Sequential&Ven_IBM&Prod_ULT3580-HH5&Rev_D2AD\5&346ea004&0&000000) disappeared from the system without first being prepared for removal.

I'm going to try the HBA first and see what happens. If that doesn't help I am going to ask for a library swap out. This is going to be an uphill battle with them but unless Dell wants additional purchases from us, they will help us fix this.

Mark_Solutions's picture

I would suggest using NBU to do the drive cleaning

There are a couple of possibilities here:

1. the library takes the drive off line when it cleans it - so the O/S loses it

2. (and i suspect this one) - when the drive needs cleaning it has had a read error just before the clean request - it is this that takes it off line and not the cleaning - this is handled by changing a setting on the switch itself as some default to handle data errors by re-setting the port. I suspect that in the event viewer you will see a qlogic event saying that the port was re-set.

Check the switch setting (cant tell you exactly where to look) as i have seen this before and any read / write error can cause the port to be reset - such an error would co-incide with the cleaning but i feel that it is probably not actually the cleaning that causes it

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

npolite's picture

Mark,

The only error that I see that is consistant in the Event Viewer with this issue is what I pasted earlier:

The device 'IBM ULT3580-HH5 SCSI Sequential Device' (SCSI\Sequential&Ven_IBM&Prod_ULT3580-HH5&Rev_D2AD\5&346ea004&0&000000) disappeared from the system without first being prepared for removal.

The cleaning being performed close to the drive disconnecting, I haven't noticed until yesterday. I will monitor this more closely to see if it keeps happening. I have had this tape library for just over 2 years now and have always had the library do the cleaning. This issue started back in the middle of August.

The only issue I have had previous to this one was that the drives wear out from the Maxell media and start causing tapes to go bad. The media was sold to us by Dell even though they mention that Maxell has not been tested with the library. Once I tell them that, they seem to back off and replace the drive. One drive was replaced back in August from the bad tape errors and this issue started just after it was replaced (or I think very close to the replacement).

I appereciate everyones help so far. I can't pinpoint this issue so far and Dell has been useless.

mph999's picture

Ahh the old Maxell media issue ...

I have seen, and I know from a very very good source that Maxell media, in some drive makes seems to be more 'abrasive' than other brands ...

Personally, when I was part of the team running backup environments, only the other two available brands of media were allowed ...

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Mark_Solutions's picture

OK - well if no changes have been made to the switch (firmware upgrade etc) and everything else has been consistent then i would say it is down to the drive itself

HP have tape tools available - not sure if they will work on an IBM drive but they give the drive a good health check

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

npolite's picture

Mark,

Thanks for the update. We have them directly connected from the HBA to the drives. I ran the IBM utility on the Dell library and everything passes when the drive is up. When it's not well can't do much with the utiliuty :)

Dell has washed their hands of this saying that nothing is wrong with the library. We'll see what the replacement Qlogic card does, though I think it won't help.

The other thing is that both drives have this issue. It happens on the second drive about 80% of the time.

Mark_Solutions's picture

OK - i dont remember anything on the HBA firmware that could be changed

It may be worth having NBU do the cleaning just in case the library firmware has a bug that takes them off line during cleaning

You could test it then - ask NBU to clean the drive and see if it goes off line - if it does it is the drive, if it doesn't it is the library

Worth a try?

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

npolite's picture

Here is the update. The frequency of the cleaning for the second drive was happening almost every day. I called into Dell to inquire about this since it didn't seem right.

The person I spoke with agreed to replace the drive. We also were able to replicate the issue from the tape library by initiating a cleaning of the drive. He also noticed that the status of the drive was in status of Logged out. I was informed that the drive goes offline while it does the cleaning and then should go to logged in after the cleaning is done.

After I got off the phone with the one tech, I decided to run a cleaning on the first drive. After the cleaning the drive status went to logged out. I called back into Dell and explained the situation. He informed me that this is a firmware bug with the drive scheduled for a December release.

So the bottom line is that I had yet another bad drive from the Maxell media which was also mixed with this firmware bug. At least I am happy that the last person I spoke with knew about this bug. Why the other two people I were dealing with, didn't know about this I will have to have someone at Dell address this. I will have to live with this issue until the firmware is released. If I only have to worry about this every two weeks, I'm ok with that.

I also need to figure out either getting rid of the Maxell media or getting Dell to swap this for us since we bought it from them.