This issue has been solved.

Tape drive down frequently.

Created: 17 Sep 2013 • Updated: 23 Sep 2013
Ramkumar S's picture
Login to vote
0 0 Votes

Hi All,

I have 1 Master and Media and 3 media servers in my Backup domain. And 12 drives are shared these 4 servers.

we are getting frequent tape drive down issue while backup running. Even we manually up the drive again its going down after sometime.

The backup jobs are getting failed "media write error(84)" and " Media position error 86 ".  There is no issue with tape drive or Media.

 

 

 

 

 

 

Filed Under

Comments

Marianne
Trusted Advisor
Accredited
Certified
17
Sep
2013

One drive going down or more

One drive going down or more than one?

Is the drive going down on one media server or all media servers?

First step is to determine if one media server is having a problem or all media server and if problem is seen on specific drive(s).

Have a look at <install-path>\veritas\netbackup\db\media\errors file on all media servers to narrow down the problem. You can also copy the files to .txt files that correspond with media server names (media1.txt, media2.txt) and upload here as File attachments.

Next step is to enable bptm log folder on all media servers to troubleshoot I/O errors from NBU point of view.

To enable logging for Media Manager (hardware), add VERBOSE entry to <install-path>\veritas\volmgr\vm.conf on ALL media server followed by restart of Device Management service.
Exact reason for drive being DOWN'ed will be logged in Event Viewer Application log. Hardware errors will be logged in System log.

 

There is no issue with tape drive or Media.

NBU is merely reporting these errors. I/O error means that something is indeed wrong with either media or something in the data path which includes hba, hba driver, cable, switch port, tape drive, tape driver, etc...

Above steps should help with pinpointing the problem.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mph999
Symantec Employee
Accredited
17
Sep
2013

"media write error(84)" and

"media write error(84)" and " Media position error 86 ".  There is no issue with tape drive or Media.

I would suggest that there is.

Adding to the outstanding post from Marianne - NBU does not write of read from the tape drives, these I/O operations are carried out by the OS.  It is very very rare for NBU to be involved in a drive issue such as this.

If I had a £1 for every time someone said "there is no issue with the drive", or even when the vendor said the same, I would be very rich ...

http://www.symantec.com/docs/TECH169477

See the Read/ Write errors section.

There is one known issue if you are using MSEO, as there is a incompatability between MSEO and asynchronous tape mark writes.

If only one drive is going down, that is a big clue.

Perhaps you could post up the file :

/usr/openv/netbackup/db/media/errrors from each media server.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Ramkumar S
Certified
17
Sep
2013

Hi Marianne, There is no

Hi Marianne,

There is no specifc one. We are facing the issue withh all the drives and all media servers. please find the attachment.

In event viewer we are getting " Event 129, ql2300 " Reset to device, \Device\RaidPort2, was issued.

Also one more thing, in our Master and media server device management, we can see 12 tape drives. But when configuring Netbackup level only 11 drives are detecting.

 

Please help me to Isolate the exact issue.

 

 

 

AttachmentSize
Tape drive staus.txt 5.15 KB
Ramkumar S
Certified
17
Sep
2013

Also, i checked in Event

Also, i checked in Event viewer, geeting below error.

"Event 4236 NetBackup AVR Daemon"

serial number check failed on IBM.ULT3580-TD3.003 (device 9, SCSI coordinates {4,0,8,0}, \\.\Tape4), drive serial number is 0007826769, database serial number is 0007820721, DOWN'ing it

 

TLD(0) [5440] Serial number check on drive 7 (device 9, SCSI coordinates {4,0,8,0}, \\.\Tape4) failed, drive serial number is 0007826769, database serial number is 0007820721

 

mph999
Symantec Employee
Accredited
18
Sep
2013

Perhaps speak with QLogic

Perhaps speak with QLogic ....

In event viewer we are getting " Event 129, ql2300 " Reset to device, \Device\RaidPort2, was issued.

Is that some sort of scsi reset I wonder, if so that could cause serious issues.

Did this every work ?  If so, what has been changed ?

AT the moment, I'm going to suspect a SAN issue of some sort (based on the ql2300 error) - NBU is simply a casualty of whatever is causing this.

Have you got the bptm log from an eaxmple when it cannot position (pref at VERBOSE 5 / General 2).

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
18
Sep
2013

May be Tape drive needs

May be Tape drive needs cleaning on Netbackup side

Plz try tpclean and power cycle the drives.

Also upgrade the drivers,firmware -tape library.

 

All these steps fixed my Tape Drive issues.

Marianne
Trusted Advisor
Accredited
Certified
18
Sep
2013

Event id 129 - check

Event id 129 - check Microsoft Support site for latest Storport drivers and/or hotfixes.

See https://support.qlogic.com/app/answers/detail/a_id/1376/~/event-id%3A-129

About mismatched serial numbers - this looks like a device mapping issue due to lack of Persistent Binding.

Use the QLogic Sansurfer software to do persistent binding - this will ensure that device paths do not change when the server is rebooted.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Mark_Solutions
Trusted Advisor
Accredited
Certified
18
Sep
2013

I am with Marianne on this

I am with Marianne on this one - if the defined path no relates to a different drive then persistent bindings are not being used - so after a reboot the paths to the drives change

Make sure that automatic mapping is turned off, setup persisient bindings, make sure that the Removable Storage Manager Service is stopped and disabled and add the AutoRun key with a value of zero on all media servers as per this tech note:

http://support.microsoft.com/kb/842411

This does apply to all Windows operating systems, not just 2003.

With all of these in place you should be OK - but once all done and rebooted re-run the device configuration wizard to set all drives to the correct persistent paths

 

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Ramkumar S
Certified
18
Sep
2013

hi,   Now i am getting the

hi,

 

Now i am getting the below error in event viewer.

 

Operator/EMM server has DOWN'ed drive IBM.ULT3580-TD3.000 (device 7)

 

Fatal open error on IBM.ULT3580-TD3.000 (device 7, \\.\Tape0): The system cannot find the file specified.  DOWN'ing it

 

 

Mark_Solutions
Trusted Advisor
Accredited
Certified
18
Sep
2013

What else do you see in the

What else do you see in the System and Application event logs on the media server at around the time this happens?

Sounds like a scsi bus reset

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Ramkumar S
Certified
18
Sep
2013

I can see below error in

I can see below error in event viewer.
 
TLD(0) [3652] Could not get serial number for robot \\.\Changer0 (SCSI coordinates {4,0,13,1})
 
Fatal open error on IBM.ULT3580-TD3.008 (device 0, \\.\Tape5): The device is not connected.  DOWN'ing it
 
 
TLD(0) [3000] Could not get DOS path from PnP path \\?\scsi#changer&ven_ibm&prod_03584l32#5&8cf2dc0&0&000d01#{53f56310-b6bf-11d0-94f2-00a0c91efb8b}, using existing path 
 
 
TLD(0) [3000] cannot open \\.\Changer0, SCSI coordinates {4,0,13,1}: The system cannot find the file specified.
 
 
 
Mark_Solutions
Trusted Advisor
Accredited
Certified
18
Sep
2013

OK - so the robot has gone

OK - so the robot has gone too - looks like a bus reset - what does it say in the system event log - should be hba warnings or similar in there - i see you mentioned those earlier (the qla errors)

Does this connection pass through a switch?

Many switches have setting that are along the lines of ... on detection of an error reset the bus

So as soon as you get a simple read error on the tape the bus gets reset and you loose the tape drive (and if the robot is mapped through the drive that goes too)

Check out the switch, make sure you have the latest HBA drivers and firmware as well as the latest firmware and drivers for the tape drives themselves

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Ramkumar S
Certified
18
Sep
2013

Yes the drives are conneted

Yes the drives are conneted through pass through option. The server is hosted in VM. we recently migrated this server from physical to VM.

Also, we are still getting "Reset to device, \Device\RaidPort2, was issued" in system event.

we checked, the tape drives and tape library drivers are up to date ( driver Date : 4/30/2013 driver version : 6.2.3.6)

 

mph999
Symantec Employee
Accredited
18
Sep
2013

Ahh, that could be the

Ahh, that could be the issue.

Tape drives are not support on VM media serevrs.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Mark_Solutions
Trusted Advisor
Accredited
Certified
18
Sep
2013

Totally agree with Martin -

Totally agree with Martin - no way is tape through VMware either supported or any good - and robotics are even worse!

Time to go back to physical!

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Marianne
Trusted Advisor
Accredited
Certified
18
Sep
2013

Pity you never asked BEFORE

Pity you never asked BEFORE migrating - we would've told you that it will not work.

It is not a NBU issue - simply not supported by VMware.

See Statement of Support for NetBackup 7.x in a Virtual Environment:    http://www.symantec.com/docs/TECH127089

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links