Video Screencast Help

NDMP Backup fails with error requesting media (tpreq)(98)

Created: 30 Jan 2013 • Updated: 30 Jan 2013 | 12 comments

Hi community,

I hereby request your assistance for the following issue:

Master Server:
Host- srv-netbckp-201
SO - Windows Server 2008 R2
Version - Symantec Netbackup 7.1.0.4

Media Server:
Host- srv-netbckp-101
SO -  RedHat Entreprise Linux Server Release 6.3
Version - Symantec Netbackup 7.1.0.4

NDMP Filler:
Host - netapp-a
[root@srv-netbckp-101 bin]# ./tpautoconf -verify netapp-a.onitelecom.pt
Connecting to host "netapp-a.onitelecom.pt" as user "backupuser"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 4...
Opening session--successful with NDMP protocol version 4
  host supports MD5 authentication
Getting MD5 challenge from host...
Logging in using MD5 method...
Host info is:
  host name "netapp-A"
  os type "NetApp"
  os version "NetApp Release 8.1P1 7-Mode"
  host id "1573839639"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore

Activity Monitor logs:

 

30-01-2013 11:56:29 - Info nbjm(pid=3776) starting backup job (jobid=1395721) for client netapp-a.onitelecom.pt, policy AutoSuecoNDMPv2, schedule Full-Weekly  
30-01-2013 11:56:29 - Info nbjm(pid=3776) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1395721, request id:{165D3498-6F64-4599-A4A7-FF436A88D4DF})  
30-01-2013 11:56:29 - requesting resource srv-netbckp-101-hcart3-robot-tld-0-netapp-a.onitelecom.pt
30-01-2013 11:56:29 - requesting resource srv-netbckp-201.intranet.onitelecom.pt.NBU_CLIENT.MAXJOBS.netapp-a.onitelecom.pt
30-01-2013 11:56:29 - requesting resource srv-netbckp-201.intranet.onitelecom.pt.NBU_POLICY.MAXJOBS.AutoSuecoNDMPv2
30-01-2013 11:56:31 - granted resource srv-netbckp-201.intranet.onitelecom.pt.NBU_CLIENT.MAXJOBS.netapp-a.onitelecom.pt
30-01-2013 11:56:31 - granted resource srv-netbckp-201.intranet.onitelecom.pt.NBU_POLICY.MAXJOBS.AutoSuecoNDMPv2
30-01-2013 11:56:31 - granted resource AAW715
30-01-2013 11:56:31 - granted resource IBM.ULTRIUM-TD3.013
30-01-2013 11:56:31 - granted resource srv-netbckp-101-hcart3-robot-tld-0-netapp-a.onitelecom.pt
30-01-2013 11:56:32 - estimated 0 Kbytes needed
30-01-2013 11:56:32 - Info nbjm(pid=3776) started backup job for client netapp-a.onitelecom.pt, policy AutoSuecoNDMPv2, schedule Full-Weekly on storage unit srv-netbckp-101-hcart3-robot-tld-0-netapp-a.onitelecom.pt
30-01-2013 11:56:33 - Info bpbrm(pid=20442) netapp-a.onitelecom.pt is the host to backup data from     
30-01-2013 11:56:33 - Info bpbrm(pid=20442) reading file list from client        
30-01-2013 11:56:33 - Info bpbrm(pid=20442) starting ndmpagent on client         
30-01-2013 11:56:33 - Info ndmpagent(pid=20446) Backup started           
30-01-2013 11:56:33 - Info bpbrm(pid=20442) bptm pid: 20448          
30-01-2013 11:56:33 - started process bpbrm (20442)
30-01-2013 11:56:33 - connecting
30-01-2013 11:56:33 - connected; connect time: 00:00:00
30-01-2013 11:56:34 - Info bptm(pid=20448) start            
30-01-2013 11:56:34 - Info bptm(pid=20448) using 30 data buffers         
30-01-2013 11:56:34 - Info bptm(pid=20448) using 131072 data buffer size        
30-01-2013 11:56:35 - Info bptm(pid=20448) start backup           
30-01-2013 11:56:35 - Info bptm(pid=20448) Waiting for mount of media id AAW715 (copy 1) on server srv-netbckp-101.intranet.onitelecom.pt. 
30-01-2013 11:56:35 - mounting AAW715
30-01-2013 11:56:39 - Error bptm(pid=20448) error requesting media, TpErrno = Robot operation failed     
30-01-2013 11:56:42 - Warning bptm(pid=20448) media id AAW715 load operation reported an error     
30-01-2013 11:56:42 - current media AAW715 complete, requesting next resource Any
30-01-2013 11:56:46 - Error ndmpagent(pid=20446) NDMP backup failed, path = UNKNOWN       
30-01-2013 11:56:46 - end writing
30-01-2013 11:56:47 - Info bptm(pid=20448) EXITING with status 98 <----------        
30-01-2013 11:56:47 - Info ndmpagent(pid=0) done. status: 150: termination requested by administrator      
error requesting media (tpreq)(98)
 
In attachment follows the bptm log from media server.
 
 
Thanks in Advance.
Best regards to all

Comments 12 CommentsJump to latest comment

Nagalla's picture

hi,

please check if there is a stuck tape in tape Drive, without having info in netbackup?

and also make sure if you robtest is able to mount and dismount the tapes without any issues?

pamorim's picture

 

Hi Nagalla,
 
The drives that the backup is using are virtual tape drives from our VTL, which are empty. I am also able to mount and dismount the tapes without any issues.
Another strange issue we have is that the drives often go offline, without reason.
This case only happen on NDMP backups, all other backups works fine.
 
Thank you for your attention.
Best regards
pamorim's picture

 

Should i check for more logs? If yes, which ones?

I hope you can help me with this issue.

 

Thanks in Advance.

Best regards
Nagalla's picture

hi,

see you are finding anyting under the below path in NDMP host media server

/usr/openv/netbackup/db/media/tpreq

/usr/openv/netbackup/db/media/drives

if you find, delete those entires,and restart the netbackup and check how it works,

if it does not,

remove all NDMP Drives from the Netbackup, and configure them again.

pamorim's picture

 

I have to wait for some important jobs to finish, in order to do a safety restart to netbackup.
Tomorrow I'll give new updates for this case.
Thank you for your attention.
Guduru's picture

hi
is this issue resolved ? you are still facing the issue?

pamorim's picture

 

Hi,
sorry but couldn't get an answer until today.

I did what you said above but unfortunately I still have the same error and same logs on NDMP backup.

Do you have more ideas/suggestions?

Best regards

 

Mark_Solutions's picture

There seems to be some communication issues here too - mention of unexpected message from NDMP and a lot of name resolution going on - which seems to resolve host hames as IP addresses rather than names.

Worth making sure you Master and Media Servers can reolve each other and the filer by short and FQDN name - use hosts files if neccesary.

Odd to have robot / tape errors on a VTL so most likely a communications / configuration issue.

Run through you config again and to be sure of everything and ask the NDMP admin for the messages log from the filer to see if that is showing any errors - also make sure that the fibre switches involved are not showing any issues (some switches will cause a port reset if any error is detected which can disconect drives)

Finally take a look at the event log on the VTL to see if that is reporting anything

This may all be a communications or connection issue, but if it centres around the NDMP then that is where you need to look

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Guduru's picture

In which NDMP method you have configured the backups?

Local or remote or 3-way ?

and is that VTL or PTL where the backup going?

pamorim's picture

What do you mean by local or remote or 3-way? I only know that it supports:

Host supports LOCAL backup/restore
Host supports 3-way backup/restore

Yes, the backup is going to VTL.

 

We are investigating with the NetApp admin what Mark_Solutions suggest.

Mauro Gabriel Barberis's picture

A 98 has a lot of causes, but NBU itself is usually not one of them.
I wouldn't go insane looking at BPTM logs, or Daemon/LTID logs.. I'd rather check configurations, and HBA/FC Cables/SFPs.

 

The things I'd tell you to check when you get a 98 are:

1) Make sure the TLD robot inventory is up to date.
2) Make sure the drive path is correct and visible. Check "storage show tape" output on the filer, and match the serial numbers shown with the paths configured in NetBackup. ( or run "tpautoconf -probe <filer>" from your NDMP Media Server, then "tpconfig -dl" to match the devices )

3) Are you 100% possitive that the correct DRIVE INDEX is assigned on NetBackup?
NetApp VTLs start on Drive Index 0, while NetBackup starts on Drive Index 1...

So if you created drives 0 and 1 in the VTL, you need to assign robdrnum 1 and 2 in NBU =
VTL Drive 0 = NBU robdrnum 1
VTL Drive 1 = NBU robdrnum 2
.
.
VTL Drive N = NBU robdrnum N+1

 

Hope this helps.