Video Screencast Help
Search Video Help Close Back
to help
Not able to make it to Vision this year? Get a sampling in the Best of Vision on Demand group.

Satus 96 - job didn't restart according to retry delay

Created: 12 Feb 2012 | 6 comments
JBiggins's picture
0 0 Votes
Login to vote

Hi,

We running Netbackup 7.1 - Windows 2003 server

This client being backed up is a windows server in a MS-Windows policy.  This is the only server in the policy.  Multi streaming is enabled and so are checkpoints.

Two streams out of four for this backup job failed over the weekend - error "unable to allocate new media for backup, storage unit has none available (96)".

Upon checking the volume pool, it's seems not enough tapes were put in over the weekend to accommodate a full backup of one of this servers, so this explains the failure.

However, the job didn't retry according to the retry period set in the master server host properties - currently the retry delay is 10mins and schedule backup attempts it 3 tries per 12 hours.

So today we added some more tapes, but then strangely enough about 5 minutes later the job restarted.

Can anyone explain why this job didn't restart according to the set retry delay?  Also, why is it with this error does the backup not resume from where it failed instead of restarting from the beginning, even though checkpoints are enabled?

Thanks

JB

Comments

mph999's picture
12
Feb
2012
0 Votes 0
Login to vote

Would need to look in the

Would need to look in the nbpem log to see what was going on, nbjm might show something also, anything else would just be guessing I think.

Run vxlogview, using -d all -o 116 and then again with -o 117

 

Martin

JBiggins's picture
12
Feb
2012
0 Votes 0
Login to vote

Ok thanks, i'll check out

Ok thanks, i'll check out these logs. 

Why is it with this error the backup can't resume and instead restarts? 

mph999's picture
12
Feb
2012
1 Vote +1
Login to vote

Not sure without the

Not sure without the logs.

NBU will not retry all jobs, some are excluded (eg, I think Oracle /rman jobs don't retry) and I suspecty some will not depending on the failure cause.

You can trace a job in the nbpem/ nbjm log using the jobid, and then the TID (hence why you should always run vxlogview with -d all).

I expect we will see "job not eligable for retry" or similar.  If you do not have retry after runday (and a suitable window), is it possible the window closed, if it was small ? - just an idea.

Martin

JBiggins's picture
13
Feb
2012
0 Votes 0
Login to vote

Are there any KB

Are there any KB articles that will tell me that once a particular failure occurs which jobs can resume and which will restart?

mph999's picture
13
Feb
2012
0 Votes 0
Login to vote

I've never seen one - a quick

I've never seen one - a quick search didn't find anything.

Best bet, would be to log a call - if you can get those log details sorted for when you do this it would be excellent ...

The call will also need ...

nbsu -c -t

name of the policy that failed

details of failure from activity monitor for the job (details tab)

 

1.  If the log shows as I expect, we will see the job was not valid for retry

... then it is reasonable to ask the question why ...

As far as I can think, the only way we can tell you why, will be to look at the details of the job and failure, and then check the NBU code to see why it behaves in this manner.

That will need a BL engineer  ....

Martin

Sagar_Kolhe's picture
22
Feb
2012
1 Vote -1
Login to vote

May be ,If you cancelled that

May be ,If you cancelled that backup ... then only it will retry most of the times..