Video Screencast Help
Scheduled Maintenance: Symantec Connect is scheduled to be down Saturday, April 19 from 10am to 2pm Pacific Standard Time (GMT: 5pm to 9pm) for server migration and upgrades.
Please accept our apologies in advance for any inconvenience this might cause.

Drives going in MIXED mode again and again

Created: 01 Apr 2013 • Updated: 07 May 2013 | 29 comments
This issue has been solved. See solution.

Hello Team.

 

Drives going in MIXED mode again and again.

[80092345@usgaub5500 bin]$ ./tpconfig -l
Device Robot Drive       Robot                    Drive                Device     Second
Type     Num Index  Type DrNum Status  Comment    Name                 Path       Device Path
robot      0    -    TLD    -       -  -          -                    /dev/sg2
  drive    -    0  hcart    6    DOWN  -          IBM.ULTRIUM-TD4.261  /dev/nst7
  drive    -    2  hcart    8      UP  -          IBM.ULTRIUM-TD4.263  /dev/nst6
  drive    -    3  hcart    2      UP  -          IBM.ULTRIUM-TD4.257  /dev/nst4
  drive    -    4  hcart    7      UP  -          IBM.ULTRIUM-TD4.262  /dev/nst3
  drive    -    5  hcart    1    DOWN  -          IBM.ULTRIUM-TD4.256  /dev/nst0
  drive    -    6  hcart    5    DOWN  -          IBM.ULTRIUM-TD4.260  /dev/nst2
  drive    -    7  hcart    3      UP  -          IBM.ULTRIUM-TD4.258  /dev/nst1
 

[80092345@usgaub5500 bin]$ ./vmoprcmd  -d

                                PENDING REQUESTS

                                     <NONE>

                                  DRIVE STATUS

Drv Type   Control  User      Label  RecMID  ExtMID  Ready   Wr.Enbl.  ReqId
  0 hcart  DOWN-TLD             -                     No       -         0
  2 hcart    TLD               Yes   HS3352  HS3352   Yes     Yes        0
  3 hcart    TLD               Yes   HS3233  HS3233   Yes     Yes        0
  4 hcart    TLD               Yes   HS3167  HS3167   Yes     Yes        0
  5 hcart  DOWN-TLD             -                     No       -         0
  6 hcart  DOWN-TLD             -                     No       -         0
  7 hcart    TLD               Yes   HS3248  HS3248   Yes     Yes        0

                             ADDITIONAL DRIVE STATUS

Drv DriveName            Shared    Assigned        Comment
  0 IBM.ULTRIUM-TD4.261   Yes      -
  2 IBM.ULTRIUM-TD4.263   Yes      usgaub5500
  3 IBM.ULTRIUM-TD4.257   Yes      usgaub5500
  4 IBM.ULTRIUM-TD4.262   Yes      usgaub5500
  5 IBM.ULTRIUM-TD4.256   Yes      -
  6 IBM.ULTRIUM-TD4.260   Yes      -
  7 IBM.ULTRIUM-TD4.258   Yes      usgaub5500
 

[80092345@usgaub5500 bin]$ ./tpautoconf -report_disc
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.261
 Drive Path = /dev/nst7
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1022003168
 TLD(0) definition Drive = 6
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.263
 Drive Path = /dev/nst6
 Inquiry = "IBM     ULTRIUM-TD4     BBH4"
 Serial Number = 1024003168
 TLD(0) definition Drive = 8
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.257
 Drive Path = /dev/nst4
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1012003168
 TLD(0) definition Drive = 2
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.262
 Drive Path = /dev/nst3
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1023003168
 TLD(0) definition Drive = 7
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.256
 Drive Path = /dev/nst0
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1011003168
 TLD(0) definition Drive = 1
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.260
 Drive Path = /dev/nst2
 Inquiry = "IBM     ULTRIUM-TD4     BBH4"
 Serial Number = 1021003168
 TLD(0) definition Drive = 5
 Hosts configured for this device:
  Host = usgaub5500
======================= Missing Device (Drive) =======================
 Drive Name = IBM.ULTRIUM-TD4.258
 Drive Path = /dev/nst1
 Inquiry = "IBM     ULTRIUM-TD4     97F9"
 Serial Number = 1013003168
 TLD(0) definition Drive = 3
 Hosts configured for this device:
  Host = usgaub5500
=========== Missing Device or no local control path (Robot) ===========
 Defined as robotic TLD(0)
 Inquiry = "SPECTRA PYTHON          2000"
 Serial Number = 9110003168
 Robot Path = /dev/sg2
 Drive = 6, Drive Name = IBM.ULTRIUM-TD4.261, Serial Number = 1022003168
 Hosts configured for this device:
  Host = usgaub5500

 

Please Help

Operating Systems:

Comments 29 CommentsJump to latest comment

Dan@NB's picture

wannawin,

Seems there are missing drive paths. You would need to delete those drives and re-configure those into NBU. That should work. Let us know.

Best Regards,
Dan

Marianne's picture

You need to find out what is wrong with server usgaub5500 at OS level.

NBU needs OS for device access - nothing can be done in NBU to fix device access problems.
Check physical connections between HBA and switch.

Check /var/log/messages for errors.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Vickie's picture

 

Cause of this issue might be one of the below,
 
1) Reservation conflicts.
2) Restart of a Media Server daemons or Services.
3) This would happen when some Media Servers sharing tape drives (i.e SSO) are restarted, while others are still up and running (sometimes even doing backups).
 
Proposed resolution :
 
A) Restart affected Media Server and Master Server.
 
OR
 
B) The best method to fix this would be to get a window, when NBU Daemons/Services can be Stopped           and Started on ALL the Media Servers and the Master Server.
Would Suggest these steps to 'clean up' the drive control modes.
 
1. Cancel All Jobs                                         (Master Server)

    "bpdbjobs -cancel_all"

2. Suspend jobs and reset allocations and close GUI.            (Master Server)

    "nbpemreq -suspend_scheduling"

    "nbtbutil -resetAll"

    Close all NBU GUI's everywhere.                        (yes Everywhere)

3 Shut down NBU services/daemons                 (Master and ALL Media Servers)

    "bp.kill_all"    or  "netbackup stop"

4. Terminate all NBU Processes if any are found lingering around.(Master and ALL Media Servers)

     "bpps -x"   or "bpps"

     "kill -9 <PID>" 

5.  Start NBU daemons/services on Master

    "bp.start_all"  or "netbackup start"

6. Start NBU daemons/services on Media Servers which have Robotic Control

    "bp.start_all"  or "netbackup start"

7. Start NBU daemons/services on remaining Media Servers

    "bp.start_all"  or "netbackup start"

8.  Open GUI's NOW. And see the drive status.
CRZ's picture

Hey wannawin, does your company have a support contract with us? 

You may need to start opening cases instead of Connect threads.

wannawin's picture

Hello CRZ.

I am very SAD to say that my company does not have direct support contract with symantec, that is why i am facing this much problem, actually i joined this company 20 days back and my project manager told me that to resolve eavch and every issue ASAP but i am the only guy there who work on troubleshooting and facing daily a new issue.....   :(  and i want to clean that all...

 

Hello Marianne..

How i proceed and check from OS level... there are total 7 drives and 3 are down, backups daily failing..

 

Hello Netbackup_user.

Today i will do the steps what you mentioned above...will revert after completion..

Marianne's picture

I have already told you where to start:

 

Check physical connections between HBA and switch.

Check /var/log/messages for errors.

Further troubleshooting/actions depend on errors seen in this OS syslog file.

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

wr's picture

20 problems in 20 days, hope you ask for 20% bonus! :)

Will Restore -- where there is a Will there is a way

wannawin's picture

Hello Marianne/ALL

I have done all the things like (Master server,media server and reset all) but again drives going in MIXED mode.

 

Below are the /var/log/messages..

[80092345@usgaub5500 log]$ tail -500 messages | grep DOW
Apr  2 06:37:13 usgaub5500 ltid[9591]: Request for media ID HS3985 is being rejected because mount requests are disabled (reason = robotic daemon going to DOWN state)
Apr  2 10:16:23 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.261 (device 0)
Apr  2 10:28:11 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.262 (device 4)
Apr  2 10:35:11 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.260 (device 6)
Apr  2 10:38:48 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.262 (device 4)
Apr  2 10:40:05 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.256 (device 5)
 

Please update

 

 

Mark_Solutions's picture

I see your tapes drives are all IBM LTO4 but have different firmware releases (97F9 and BBH4)

Have you had any work done on the library recently?

Firmware upgrades could cause the drives scsi enquiry string to change and so would need re-adding to the system

Start at the O/S to make sure it is seeing all drives and then reconfigure within NetBackup

If possible get all your drives on the same firmware release (preferably the latest)

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Dan@NB's picture

Wannawin,

As per Marianne's post, Check physical connections between HBA and switch.
Possible zoning issue of drives from Switch level. Please get your Storage team look for the zoned drives and there status. Should be something wrong going over there.

+ Dan

wannawin's picture

Hello Mark.

All drives are visible at OS level and done with drive reconfigur via drive configuration wizard. yes recently they upgrade the drive firmware...

 

Found something more

mv d3 s72

Initiating MOVE_MEDIUM from address 258 to 4167

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s73

Initiating MOVE_MEDIUM from address 258 to 4168

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s73

Initiating MOVE_MEDIUM from address 258 to 4168

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d3 s74

Initiating MOVE_MEDIUM from address 258 to 4169

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d4 s78

Initiating MOVE_MEDIUM from address 259 to 4173

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d4 s78

Initiating MOVE_MEDIUM from address 259 to 4173

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

 

mv d6 s77

Initiating MOVE_MEDIUM from address 261 to 4172

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d6 s76

Initiating MOVE_MEDIUM from address 261 to 4171

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

mv d7 s75

Initiating MOVE_MEDIUM from address 262 to 4170

move_medium failed

sense key = 0x5, asc = 0x3b, ascq = 0x11, MEDIUM MAGAZINE NOT ACCESSIBLE

 And from /var/log/messages found that

Apr  2 06:37:13 usgaub5500 tldd[10292]: TLD(0) going to DOWN state, status: Robotic arm has no addressable holder

Apr  2 06:37:13 usgaub5500 ltid[9591]: Request for media ID HS3985 is being rejected because mount requests are disabled (reason = robotic daemon going to DOWN state)

What does it mean "robotic daemon going to DOWN state"

 

Apr  2 10:16:23 usgaub5500 ltid[24568]: Operator/EMM server has DOWN'ed drive IBM.ULTRIUM-TD4.261 (device 0)

Robotic control host is Master server

Marianne's picture

Please copy the entire messages file to messages.txt and post as File attachment.

To just grep for DOWN does not help and does not tell us ANYTHING about what went wrong to cause the DOWN state.

About robtest - did you 'unload' drives before trying to move tapes back to slot?

e.g. 

unload d3      (wait for drive to unload tape - you will receive a message...  then move to slot)

mv d3 s72

 

As per Mark's excellent post - please ensure that all tape drives are one the same firmware level. At the moment they are not.

The following says to me that you need to log a call with your hardware vendor:

 Robotic arm has no addressable holder

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Mark_Solutions's picture

If the O/S sees everything but you have had firmware upgrades you may need to delete everything from NetBackup (drives and then robotics) and then re-add them back in using the device wizard - but I would get all drive and robotic firmware up to date first and also check the library interface itself to make sure everything is OK

Perhaps the firmware upgrade has affected the partition of the library and it needs re-setting via the web GUI for the tape library as it sounds like the robotics are disjoined fom the tape drives

Sort out the library and its firmware and then delete and re-add everything in NetBackup

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

wannawin's picture

Hello Mark.

I will logged a case with vendor and will upgrade the firmware to the latest one . Re-setting of library means hard reboot of library or something different?

Mark_Solutions's picture

By re-setting I meant that you may need to re-create the partition within the library (if it uses partitions etc.)

Just check that web GUI is showing the robot, drives, magazines and load ports all in the same partition and not leaving any parts orphaned - also that the robotic path is correct as many use robotic pass through via a drive

If the library vendor is coming out they can probably assist with all of this

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

wannawin's picture

Hello Marianne/All.

Please find /var/log/messages..

AttachmentSize
messages.txt 332.2 KB
mph999's picture

Hardware/ Media errors.

 

Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x03, Type: Warning, Flag: HARD ERROR, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x06, Type: Critical, Flag: WRITE FAILURE, from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954
Apr  1 12:28:53 usgaub5500 bptm[22336]: TapeAlert Code: 0x27, Type: Warning, Flag: DIAGNOSTICS REQ., from drive IBM.ULTRIUM-TD4.257 (index 3), Media Id HS3954

Nothing can be done in NBU to fix this, you have to talk with the hardware vendor.

 

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
SOLUTION
Marianne's picture

Seems 'someone' has been manually moving tapes around in the robot?

 

Apr  2 10:15:33 usgaub5500 tldcd[27744]: TLD(0) cannot dismount drive 6, slot 14 already is full

Apr  2 10:15:39 usgaub5500 tldcd[26783]: TLD(0) cannot dismount drive 5, slot 11 already is full

Apr  2 10:28:08 usgaub5500 tldcd[31000]: TLD(0) cannot dismount drive 7, slot 1 already is full

Apr  2 10:28:33 usgaub5500 kernel: st 7:0:3:0: reservation conflict

Apr  2 10:28:33 usgaub5500 ltid[31227]: Operator requested SCSI Release of Drive IBM.ULTRIUM-TD4.261 was successful

Apr  2 10:28:48 usgaub5500 tldcd[31147]: TLD(0) expected barcode (HS2790          ) in slot 7, found barcode (HS3985          )

Apr  2 10:48:39 usgaub5500 tldcd[877]: TLD(0) expected barcode (HS4057          ) in slot 1, found barcode (HS2790          )
 
Please suspend all backups, remove tapes from slot 1, 7, 11 and 14.
 
It may be possible to use robtest to move these tapes to the cap. 
 
Use robtest to 'unload' drive 5, 6 and 7, then move those tapes to their correct slots.
 
When you are sure that there are no tapes in any of the drives, power cycle the robot.
This will force an inventory of the robot when it starts up.
 
Wait for tld(0) to go to UP state in NBU, then do inventory with NBU. Select 'empty media access port' before you select 'Start'.
 
Let us know how it goes...
 
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

wannawin's picture

Hello Marianne.

 

Trying to unload drives but it gives error..

unload d1
Opening /dev/nst0, on the local host, please wait...
Error - cannot open /dev/nst0 (Input/output error)
unload d5
Opening /dev/nst2, on the local host, please wait...
Error - cannot open /dev/nst2 (Input/output error)
unload 7
Opening /dev/nst3, on the local host, please wait...
Error - cannot open /dev/nst3 (Input/output error)

Please suggest..

Marianne's picture

Please show us output of 's d'

I think the tapes may be unloaded already, there is just nowhere to put them.

The only other alternative is to go open the robot door, manually unload all drives, take these tapes out of the robot, close robot, check that it does an inventory of itself at this point, then inventory NBU. 
Now add removed tapes back via the CAP.

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

wannawin's picture

Hello marianne..

 

  2)  none/quit
Enter choice: 1

Robot selected: TLD(0)   robotic path = /dev/sg2

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -rn 0 -r /dev/sg2

Opening /dev/sg2
MODE_SENSE complete
Enter tld commands (? returns help information)
s d
drive 1 (addr 256) access = 1 Contains Cartridge = yes
Barcode = HS3512
drive 2 (addr 257) access = 1 Contains Cartridge = no
drive 3 (addr 258) access = 1 Contains Cartridge = yes
Source address = 4111 (slot 16)
Barcode = HS3358
drive 4 (addr 259) access = 1 Contains Cartridge = yes
Source address = 4151 (slot 56)
Barcode = HS3239
drive 5 (addr 260) access = 1 Contains Cartridge = yes
Barcode = HS3496
drive 6 (addr 261) access = 1 Contains Cartridge = yes
Source address = 4134 (slot 39)
Barcode = HS3242
drive 7 (addr 262) access = 1 Contains Cartridge = yes
Barcode = HS4057
drive 8 (addr 263) access = 1 Contains Cartridge = yes
Source address = 4107 (slot 12)
Barcode = HS3237
READ_ELEMENT_STATUS complete

Marianne's picture

You can see that drives 5 and 7 contain tapes, but that the robot does not list 'source' slot addresses for them.

Please go back to advice given over the last 2 days:

Suspend all backups - NBU will unmount all tapes that can be unmounted.

Go to the robot, open the door, manually unload all tapes that are still in drives. 

Take these tapes out of the robot, close robot,

Check that it does an inventory of itself at this point, then inventory NBU. 

Now add removed tapes back via the CAP.

Let us know how it goes....

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

wannawin's picture

There was a issue with robot, robot has been changed..

mph999's picture

Please be generous enough to mark the post that helped you the most as the solution.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
sazz.'s picture

I am surprised to see that you have marked your own post as solution, though in 24 replies there were multiple times it was mentioned that it is a hardware issue.

I guess Robot is hardware wink

wannawin's picture

Hello Martin.

Sorry for this, by mistake i did this..

Your post should be mark as a solution but how i change this.

Marianne's picture

You need to clear existing solution first - only you can do it.

After that you should be able to mark a different solution.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

wannawin's picture

Hello Marianne.

Thanks much for understanding me this.

Thanks Martin.....

mph999's picture

No problem, thank you for the solution .

M

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805