Veritas NBU6.5-Driver I/O Error
Created: 27 Feb 2013 | 24 comments
Today,When I go to check Vertias NBU6.5 Backup status, find that one policy shows error,as follows:
begin writing
Error bptm (pid=19808) cannot write image to media id 000054, drive index 1, I/O error
end writing
status code 84
Please give some suggests.
TKS
Operating Systems:
Discussion Filed Under:
Comments 24 Comments • Jump to latest comment
hi ,
we would need more details
1) what is your operation system of Media server?
2) what is your hardware? tape library info
3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?
4) let us know the output of below command
scan
vmoprcmd -d
tpconfig -d
tpautoconf -t
5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors
6) detail status of the failed job.
and also did you check if the tape is write protected.?
does it giving the error afer writing some data ,, or without writing any data?
Hi,
1) what is your operation system of Media server?
The media server is running solaris 10 sparc.
2) what is your hardware? tape library info
Tape library SL500
3) are you seeing this error for any particular media or Drive? or its random between the Drives and medias?
This is random for any media.
4) let us know the output of below command
See file attach "sl500_cmd_logs".
5) and also the logs of bptm and /usr/openv/netbackup/db/media/errors
See file attach "errors"
6) detail status of the failed job.
See picture attach image005 and image 006
and also did you check if the tape is write protected.?
Yes, I have checked, the tape is not write proteted
Thank you.
NetBackup relies on the OS for I/O. This means that NBU is merely reporting error and that we are not going to get a lot of info by looking at NBU alone.
If you are seeing regular status 84's, then /usr/openv/netbackup/db/media/error will help us determine if I/O errors are experienced on a particular tape drive or particular media.
You also need to enable the following logs on the media server:
Create /usr/openv/netbackup/logs/bptm folder
Add VERBOSE entry to /usr/openv/volmgr/vm.conf and restart NBU on media server.
Device-related messages and errors will now be logged to /var/adm/messages.
Some helpful TN's:
http://www.symantec.com/docs/TECH169477
http://www.symantec.com/docs/TECH43243
Please see this extract from above doc:
As an application, NetBackup has no direct access to a device, instead relying on the operating system (OS) to handle any communication with the device. This means that during a write operation NetBackup asks the OS to write to the device and report back the success or failure of that operation. If there is a failure, NetBackup will merely report that a failure occurred, and any troubleshooting should start at the OS level. If the OS is unable to perform the write, there are three likely causes; OS configuration, a problem on the SCSI path, or a problem with the device.
**** PS **** Are you aware of the fact that support for NBU 6.5 has ended in Oct last year?
PLEASE upgrade!
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
Hi,
I sent the log your request, please see the attach file. Also, I don't find vm.conf in the directory /usr/openv/volmgr.
I have 6 tape drive, how to know which tape drive is corrupted.I check the service of the SL500 LED warning but no errors.
You do not necessarily see a warning light - the drive has no mechanical fault as such, it just can't read/ write reliably.
From the last 2 or 3 months, from the error.txt file I find :
(I ran this through a script, the file alone does not contain the information in this format)
In older versions of NBU the vm.conf file does not exist by default.
Please create the file and insert
VERBOSE
in the file.
Save the file and restart NBU.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
Please go ahead and clean the Tape Drives and then shoot a backup and see how it goes....
Also check last time when it was cleaned by executing the below command:
/usr/openv/volmgr/bin/tpclean -L
I know Tape library SL500 have its own functinality to clean the Tape Drives but you need to check what frequency was set for the Tape Drives
/usr/openv/volmgr/bin/tpclean -F drive_name cleaning_frequency
Hope it helps!!!
Hi all,
I have upgraded firmware for tape drive and Library and also update patch for OS but the error does't fix.
Please give me advice.
Have you created vm.conf with VERBOSE entry yet?
Can you see that Media Manager prosesses are running with -v?
Have you checked /var/adm/messages for hardware errors?
There is more to the data path than just library and tape drives - there is also the hba in the server, cable(s) that goes to a switch, switch port(s), cables that go to each of the drives.
As I've said before, looking at NBU only is not going to tell us much. You need to troubleshoot at OS level.
Switch logs may also help.
The error log is telling us that you are experiencing errors on basically all the drives and lots of tapes. Chances are slim that all of them are faulty. What is the common factor that links all drives to the OS? The hba comes to mind, right?
hba is also more than just a piece of hardware - there is firmware and drivers that must be checked along with the hardware. /var/adm/messages is a good starting point to look for device-related errors.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
I have created vm.conf with VERBOSE entry.
In the policy that I have run, the policy failed to ues tape drive id 001 and 003 (please see attach file). But when I use the tar command of OS for each drives is ok.
root@Nbmaster2 # tar cvf /dev/rmt/10 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # tar cvf /dev/rmt/7 explominer.tar
a explominer.tar 24244 tape blocks
root@Nbmaster2 #
What is status of tape drives? Check with 'vmoprcmd -d'.
Have you checked bptm log and messages file for errors?
Ability to write with tar command confirms that I/O errors are intermittent.
Old firmware on hba is known for giving errors when load is high.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
The status of tape drives are OK.
root@Nbmaster2 # /usr/openv/volmgr/bin/vmoprcmd -d
PENDING REQUESTS
<NONE>
DRIVE STATUS
Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - 0
1 hcart TLD - No - 0
2 hcart TLD - No - 0
3 hcart TLD - No - 0
4 hcart DOWN-TLD - No - 0
5 hcart TLD - No - 0
ADDITIONAL DRIVE STATUS
Drv DriveName Shared Assigned Comment
0 HP.ULTRIUM4-SCSI.000 No -
1 HP.ULTRIUM4-SCSI.001 No -
2 HP.ULTRIUM4-SCSI.002 No -
3 HP.ULTRIUM4-SCSI.003 No -
4 HP.ULTRIUM4-SCSI.004 No -
5 HP.ULTRIUM4-SCSI.005 No -
root@Nbmaster2 #
I would suggest to the team hardware about the upgrade firmware for hba card.
Drive 004 is DOWN. Have you checked bptm log and messages files as suggested previously?
You need to do some 'home work' before suggesting firmware upgrade.
Check messages file (or backup of messages file) for boot messages. (who -b will tell you when last the server was rebooted). You will find the hba make and model along with firmware and driver version.
While you have messages file open, look for hardware-related errors.
Look on hba vendor's web site for known issues with the firmware and driver versions.
About drives not getting used, check for stuck/orphaned device allocation:
nbrbutil -dump
Check the 'MDS Allocation' section at the bottom of the output for media or drive allocation that is not really in use, not the Allocation Key number and release with:
nbrbutil -releaseMDS <mdsAlocationKey>
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
/usr/openv/netbackup/db/media/errors
03/05/13 01:20:39 000144 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/05/13 01:20:44 000144 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/05/13 04:40:50 000017 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/05/13 04:40:55 000017 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 06:23:15 000141 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 06:23:20 000141 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:24:19 000051 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/06/13 21:24:24 000051 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
03/06/13 21:59:51 000130 3 WRITE_ERROR HP.ULTRIUM4-SCSI.003
03/06/13 21:59:56 000130 3 TAPE_ALERT HP.ULTRIUM4-SCSI.003 0x10000000 0x00000000
03/06/13 23:38:21 000010 2 TAPE_ALERT HP.ULTRIUM4-SCSI.002 0x10000000 0x00000000
03/07/13 08:43:50 000019 5 TAPE_ALERT HP.ULTRIUM4-SCSI.005 0x10000000 0x00000000
03/07/13 09:01:47 000143 1 WRITE_ERROR HP.ULTRIUM4-SCSI.001
03/07/13 09:01:52 000143 1 TAPE_ALERT HP.ULTRIUM4-SCSI.001 0x10000000 0x00000000
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -update -drive 4 -drstatus UP
Updated drive < HP.ULTRIUM4-SCSI.004 > of type hcart in configuration
root@Nbmaster2 #
root@Nbmaster2 #
root@Nbmaster2 # /usr/openv/volmgr/bin/tpconfig -d
Id DriveName Type Residence
Drive Path Status
****************************************************************************
0 HP.ULTRIUM4-SCSI.000 hcart TLD(0) DRIVE=3
/dev/rmt/8cbn UP
1 HP.ULTRIUM4-SCSI.001 hcart TLD(0) DRIVE=5
/dev/rmt/10cbn UP
2 HP.ULTRIUM4-SCSI.002 hcart TLD(0) DRIVE=6
/dev/rmt/11cbn UP
3 HP.ULTRIUM4-SCSI.003 hcart TLD(0) DRIVE=2
/dev/rmt/7cbn UP
4 HP.ULTRIUM4-SCSI.004 hcart TLD(0) DRIVE=1
/dev/rmt/6cbn UP
5 HP.ULTRIUM4-SCSI.005 hcart TLD(0) DRIVE=4
/dev/rmt/9cbn UP
Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c1tw500104f000b88092l0
EMM Server = Nbmaster2
root@Nbmaster2 #
Seems you are ignoring my advice to check bptm log and /var/adm/messages.
I give up....
Good luck!
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
I sent log /var/adm/messages for the hardware team for their review and concluded that the error only appears in the log when the write data to tape using Netbackup software, also uses OS command not found error.
I just want to confirm that the configuration of Veritas is correct and whether this is a bug of veritas 6.5.
Anyway, thank you for your support very much.
I'm repeating above extract from Status 84 Troubleshooting Guide:
Your 'tar' tests are writing small amounts of data (24244 tape blocks) to one tape drive at a time.
This proofs nothing.
Repeat the test with more data (+- 5 GB) and write to all 6 drives at the same time.
HBA firmware and/or driver issues normally show up when high I/O is experienced.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
Thank you for the advice, I will follow your instructions.
Also check the output of iostat -En and see if any error reported for drive
If yes, check with hardware vendor
run iostat -En on media server and see if drive reoprt any error .
if yes, check with vendor
NetBackup does NOT write to drives - ever.
NBU sends the data to the operating system, the operating system then writes it to the drive, using the blocksize requested by NBU.
I/O errors are not caused by NBU.
Martin
Would you like to reply?
Login or Register to post your comment.