Video Screencast Help
Search Video Help Close Back
to help
Not able to make it to Vision this year? Get a sampling in the Best of Vision on Demand group.

Jobs hung in active and queued state for hours

Updated: 18 Sep 2010 | 8 comments
m3lyan's picture
0 0 Votes
Login to vote

We have faced same problem today  ,
Jobs in the Activity Monitor are static / hung / stuck / frozen in either an " Active " or " Queued " state
I’ve
Check
Nbdb_ping EMM database online, and ive make full validation 
Bpstulist ... storage unit viewable
Tpconfig –l    all devices up
NBU Services up
/usr/openv/netbackup/bin/admincmd/bpdbjobs –report  show that Jobs are hung on active or queued state
At the end we restart Netbackup services and rerun backup jobs

No system error. No system core dump. No file system full. No memory leak.
No errors logged in the SL8500

any help please

Comments

Andy Welburn's picture
17
Mar
2010
0 Votes 0
Login to vote

Anything on the Master?

eg:
problems report
/var/adm/messages
NetBackup logs (e.g. bptm)

Anything on the Client(s):
Are they all different O/S's that are hanging or all of a type (e.g. Win2003)?
Anything reported on client (logs/event viewer/process monitor)?
Processes still running on client (bpfis/bpbkar)?

Anything on the jobs:
Have these jobs worked before or is this a new set up?
If worked previously, anything changed recently?
All jobs hanging or just a few?
Anything in job details? (e.g. waiting for resources)

Regards Andy

"It's not too late to panic ..."

m3lyan's picture
17
Mar
2010
0 Votes 0
Login to vote

no error in os level and we

no error in os level
and we have one master-media and another 2 media server in two site , main and dr
problem cant happen on all servers at the same time

client --- different os (some fs some database ..)

this problem suddenly happen
all jobs hanging

Andy Welburn's picture
17
Mar
2010
0 Votes 0
Login to vote

So there's nothing at all in any logs,

nothing at all in Job Details & nothing at all changed recently (not just NetBackup but at a corporate level)?

Seeing as nothing is working at the moment, have you tried restarting NetBackup services or, push comes to shove, restarting the Master/Media servers?

Regards Andy

"It's not too late to panic ..."

m3lyan's picture
17
Mar
2010
0 Votes 0
Login to vote

At the end we restart

At the end we restart Netbackup services and rerun backup jobs
this problem ocuured at 9 march then 15 march and 17 march

rjrumfelt's picture
17
Mar
2010
0 Votes 0
Login to vote

I have this problem with several hosts

but they are all Windows 2k3 machines - no errors in the event logs, no errors in any of the NBU logs, but you said that this problem occurs across several different operating systems?

Andy Welburn's picture
17
Mar
2010
0 Votes 0
Login to vote
rjrumfelt's picture
17
Mar
2010
0 Votes 0
Login to vote

Yes

We've attempted both methods.  I've had a case open with Symantec for some time and we've not really gotten very far. 

The closest thing that I can find is that when looking at the bpbkar log, you can see the exact moment when the backup hangs, as it looks like the servers just pass keep_alives back and forth, without exchanging any actual data.

There's a technote out there for that issue, however the size of the keep_alive signals are the correct size - the technote mentions the size of the keep_alives getting corrupted which causes the hang-up.  Nonetheless, I installed an EEB for the issue that did not fix the problem *sighs*

I've had every possible team here check the environment out and they can find no apparent issues.  Symantec is supposed to send our case to back line engineering.  We'll see if that gets us anywhere.

David McMullin's picture
18
Mar
2010
0 Votes 0
Login to vote

when all else fails - blame the network!

You might check if ANYTHING has changed -

I know I had an issue where I wanted to make two copies of an RMAN backup - and I set the multiple copies =2 on the application schedule not the automatic one - my whole netbackup environment went crazy and I had all kinds of issues.

I would never have thought that changing ONE policy would thrash my whole environment - but it did.

Ask everybody to check for changes - sometimes the smallest ones can cause the most issues.

NBU 7.0.1 on Solaris 10
writing to EMC 4206 VTL
duplicating to LTO2 in SL8500
(Soon to be LTO5)
using ACSLS 7.3.1