Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Drive down Issue

Updated: 23 Sep 2010 | 4 comments
Parthiban's picture
0 0 Votes
Login to vote

Hi All,

We are facing some drive down issue. we have 5 media servers. All are solaris servers. All the drives are shared to all Media servers. Drives are frequently going down. Backup jobs are using only 4 drives out of 10. Those 4 drives also going down frequently.

Library - IBM 3584 model.

I am not good with unix. Here with i attached the /var/adm/messages files from one of our media server.

Please suggest me.

discussion Filed Under:

Comments

Marianne van den Berg's picture
21
Mar
2010
0 Votes 0
Login to vote

There seem to be stuck tapes

There seem to be stuck tapes in drives 1, 2, 3, 4, 5, and 6.

DecodeMount: TLD(0) drive 3, Actual status: Unable to SCSI unload drive
TLD(0) DismountTape ****** from drive 3
DecodeDismount: TLD(0) drive 3, Actual status: Robotic dismount failure

These type of messages are repeated for drives 1 - 6.

If the tapes can be unloaded from the drives by pressing the eject button on the drive, you probably have device mapping mismatch. If the robotic drive number is not correctly mapped to the correct /dev/rmt device name, you will see dismount errors. Mappings that are initially correct can get out of sync if Persistent Binding is not done at O/S level - tape drives are assigned new /dev/rmt  addresses when the server gets rebooted.

I would use robtest first of all to test unload of tape drives and move to robot slots. If errors are experienced, use Robot GUI to unload tape drives or open robot and press unload button. Next, I'd delete all tape drives in the device configuration and use the wizard to re-config tape drives.

If tapes cannot be unloaded via robtest, robot GUI or by pressing unload/eject button, you need to call your hardware support to remove stuck tapes and examine tapes and drives.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links

schmaustech's picture
21
Mar
2010
0 Votes 0
Login to vote

SCSI Timeouts:  You seem to

SCSI Timeouts:  You seem to have a lot of scsi timeouts. Do you know if your shared drives are passing over an ISL between fiber channel switches or if they are all hung off the same switch, is that switch at over allocated?

I would also verify your drive paths on each media server and make sure that the drive on the media server is the drive that is being loaded by the robotic control host.  I have seen situations where when a unix host is rebooted the device paths for the tape devices come up in a different order due to lack of persistent bindings.

Regards,

Benjamin Schmaus

Deepak W's picture
21
Mar
2010
1 Vote +1
Login to vote

Parthiban, I have faced this

Parthiban,

I have faced this issue twice in our environment.

1. First time - backup admin had loaded the tapes in TL slots but he didn't checked the media slots for currently IN USE medias which were there in drives. That time I had to empty the slots which were belonging to tapes which are there into drives. 

NetBackup was putting drive into down state as robot was unable to unload the tape from TL drive

2. Second time - everything was fine then also tape drive was going down in between. This issue was resolved when we upgraded out NBU from 6.5.3 to 6.5.5

Hope my experience helps you resolve your problem.

-- Deepak W (Kindly close the thread if your query is resolved)

ayodeji's picture
22
Mar
2010
0 Votes 0
Login to vote

correct

i entirely agree with Deepak, the two are points are very vital. i have also experienced this in my environment and theses two points resolved it.

Ayodeji Oni
NetApp/VERITAS/IBM/SCS