Video Screencast Help

Robot is missing and Drive is down

Created: 04 Mar 2013 • Updated: 10 Mar 2013 | 11 comments
This issue has been solved. See solution.

Hi,

Good day to all,

My company is the vendor of Netbackup and I am the support for our one client.

Whenever they report to me, that Netbackup is not running,  I always troubleshoot it and able to up the Netbackup by these commands.

                       # /usr/openv/volmgr/bin/scan

                       # /usr/openv/volmgr/bin/scan

                       # /usr/openv/volmgr/bin/robtest

I always found out that TLD (0) is not enabled, and my corrective action is resetting the button PHYSICALLY in the tape library. After that procedure TLD (0) is now enabled and the Robot is now accesible.

Next Step is bringing the drive to UP state through this commands

# /usr/openv/volmgr/bin/tpconfig –d

            #   /usr/openv/volmgr/bin/vmoprcmd -up n

                            where n is the drive number

I would like to know what is the root cause of this. Is it possible that Robot becomes missing instantly and the drive is going down without doing anything? What are the factors that cause the robot and drive to behave this way?

Thank you for reading and may you help me with my problem. Thanks and Regards   -e_win

Operating Systems:
Discussion Filed Under:

Comments 11 CommentsJump to latest comment

Marianne's picture

You need to enable Media Manager logging by adding VERBOSE entry to vm.conf on the Robot control host and restart NBU.
You will now see NBU and OS device-related errors logged in OS syslog.

Resetting hardware tells us there is a problem with the hardware. Syslog errors will be your proof.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mph999's picture

I would go one stange further than Mariannes excellent post and state that if ALL you do is reset the library, then this is a library issue, and not a NetBackup issue, even without seeing the logs.

As a matter of interest, what is the make/ model of the library ?

Martin

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
SOLUTION
Marianne's picture

I have seen in the past where an old/failing library was showing same symptoms.

We could see in /var/ad/messages that OS has lost connectivity to the robot.
Entire library was replaced shortly afterwards.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Mark_Solutions's picture

Also, assuming that this is a fibre attached library, worth checking the SAN switch in use

Some have a setting that says something to the effect of "if i detect and error i will re-set the port" (don't know exactly which setting it is and it varies between switches)

Many libraries path the robotics through the tape drive so when the drives gets a read error the port automatically re-sets and you loose the drive and the robot.

A simple change on the SAN Switch port settings can prevent this

Worth a look

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

mph999's picture

Good point mark, resetting the library will cause it to log out and back into the san.

M

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
e_win's picture

Hi, thanks,

@Marianne: I enable logging by putting a "VERBOSE" in vm.conf, and wait for NBU to schedule restart

I also encountered this message "OS has lost connectivity to the robot."

Are there specific entry in /var/adm/messages that will be my proof to point a library issue?

@Martin: tape Liblary is SL500

@Mark: Yes it is connected to SAN Switch, I somewhat understand but I have no idea regarding SAN Switch port settings

Marianne's picture

Hardware errors such as SAN attached devices disappearing will be logged in /var/adm/messages, regardless of VERBOSE entry in vm.conf or not. 

If you have a copy of messages (or messages.#) that co-incides with the time that you last saw the problem, please copy to messages.txt and post as File attachment. We will help you to find the hardware error. 

Action to be taken depends on the type of error seen.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mph999's picture

As opposed to restarting the library, when the issue happens again, just disable and re-enable the switch port it is connected to.

If it is a san issue, this should reset it, and will narrow down the issue.  

Martin

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
e_win's picture

Hi Marianne,

I attached messages below.

I notice the recurrent message

LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED

I tried to run tpdconfig -d and robotic path is recognized

Then I run robtest and results are:

Configured robots with local control supporting test utilities:
  TLD(0)     robotic path = /dev/sg/c0tw500104f0007f106fl0
  TLD(1)     robotic path = /dev/sg/c0tw2103001b32653b82l0
  TLD(2)     robotic path = /dev/sg/c0tw10000000c9e7d367l5

Robot Selection
---------------
  1)  TLD 0
  2)  TLD 1
  3)  TLD 2
  4)  none/quit
Enter choice: 1

Robot selected: TLD(0)   robotic path = /dev/sg/c0tw500104f0007f106fl0

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 0 -r /dev/sg/c0tw500104f0007f106fl0

Opening /dev/sg/c0tw500104f0007f106fl0
MODE_SENSE complete
Enter tld commands (? returns help information)
inquiry
Inquiry_data: STK     SL500           1432
mode
First transport addr = 0, Number transport elements = 1
First storage addr = 1000, Number storage elements = 154
First media access port addr = 10, Number media access port elements = 5
First drive addr = 500, Number drive elements = 6
Library does have a barcode reader
MODE_SENSE complete
init
Initiating INITIALIZE_ELEMENT_STATUS
initialize_element_status failed
sense key = 0x2, asc = 0x4, ascq = 0x3, LOGICAL UNIT NOT READY, MANUAL INTERVENTION REQUIRED

Regards

e_win

AttachmentSize
messages.txt 1.73 MB
Will Restore's picture

Sure sounds to me like you have hardware failure.  I would call library in for service. 

Will Restore -- where there is a Will there is a way

e_win's picture

Thanks,

It is a hardware issue. We change the robot and the backup runs. but after a day, backup is down due to database system error, and I think it is now a netbackup issue.