Jobs hung in active and queued state for hours
Updated: 18 Sep 2010 | 8 comments
We have faced same problem today ,
Jobs in the Activity Monitor are static / hung / stuck / frozen in either an " Active " or " Queued " state
I’ve
Check
Nbdb_ping EMM database online, and ive make full validation
Bpstulist ... storage unit viewable
Tpconfig –l all devices up
NBU Services up
/usr/openv/netbackup/bin/admincmd/bpdbjobs –report show that Jobs are hung on active or queued state
At the end we restart Netbackup services and rerun backup jobs
No system error. No system core dump. No file system full. No memory leak.
No errors logged in the SL8500
any help please
Discussion Filed Under:
Comments
Anything on the Master?
eg:
problems report
/var/adm/messages
NetBackup logs (e.g. bptm)
Anything on the Client(s):
Are they all different O/S's that are hanging or all of a type (e.g. Win2003)?
Anything reported on client (logs/event viewer/process monitor)?
Processes still running on client (bpfis/bpbkar)?
Anything on the jobs:
Have these jobs worked before or is this a new set up?
If worked previously, anything changed recently?
All jobs hanging or just a few?
Anything in job details? (e.g. waiting for resources)
Regards Andy
"It's not too late to panic ..."
no error in os level and we
no error in os level
and we have one master-media and another 2 media server in two site , main and dr
problem cant happen on all servers at the same time
client --- different os (some fs some database ..)
this problem suddenly happen
all jobs hanging
So there's nothing at all in any logs,
nothing at all in Job Details & nothing at all changed recently (not just NetBackup but at a corporate level)?
Seeing as nothing is working at the moment, have you tried restarting NetBackup services or, push comes to shove, restarting the Master/Media servers?
Regards Andy
"It's not too late to panic ..."
At the end we restart
At the end we restart Netbackup services and rerun backup jobs
this problem ocuured at 9 march then 15 march and 17 march
I have this problem with several hosts
but they are all Windows 2k3 machines - no errors in the event logs, no errors in any of the NBU logs, but you said that this problem occurs across several different operating systems?
WOFB set up for these 2003 clients?
If so, set to VSS?
Backup jobs are hanging on Windows 2003 clients. <VSP> is still enabled for the client's backups in master client attributes.
Regards Andy
"It's not too late to panic ..."
Yes
We've attempted both methods. I've had a case open with Symantec for some time and we've not really gotten very far.
The closest thing that I can find is that when looking at the bpbkar log, you can see the exact moment when the backup hangs, as it looks like the servers just pass keep_alives back and forth, without exchanging any actual data.
There's a technote out there for that issue, however the size of the keep_alive signals are the correct size - the technote mentions the size of the keep_alives getting corrupted which causes the hang-up. Nonetheless, I installed an EEB for the issue that did not fix the problem *sighs*
I've had every possible team here check the environment out and they can find no apparent issues. Symantec is supposed to send our case to back line engineering. We'll see if that gets us anywhere.
when all else fails - blame the network!
You might check if ANYTHING has changed -
I know I had an issue where I wanted to make two copies of an RMAN backup - and I set the multiple copies =2 on the application schedule not the automatic one - my whole netbackup environment went crazy and I had all kinds of issues.
I would never have thought that changing ONE policy would thrash my whole environment - but it did.
Ask everybody to check for changes - sometimes the smallest ones can cause the most issues.
NBU 7.0.1 on Solaris 10
writing to EMC 4206 VTL
duplicating to LTO2 in SL8500
(Soon to be LTO5)
using ACSLS 7.3.1
Would you like to reply?
Login or Register to post your comment.