BUG REPORT: At NetBackup versions 6.5.5, 6.5.6, 7.0, 7.0.1 and 7.1, ghost jobs or numerous jobs with status 50's may be seen in the activity monitor that can't be deleted.

Article:TECH146990  |  Created: 2010-12-27  |  Updated: 2011-11-10  |  Article URL http://www.symantec.com/docs/TECH146990
Article Type
Technical Solution

Product(s)

Issue



At NetBackup versions 6.5.5, 6.5.6, 7.0 and 7.1, "ghost" jobs or numerous jobs with status 50's may be seen in the activity monitor that can't be deleted.  This will occur more frequently when using the Windows Administration Console versus the Java Console. 

This can occur for four reasons:

Item 1.  A bug in NetBackup that will manifest itself by numerous ghost jobs after a recycle.  This is NetBackup bug number ET2155673.

Item 2.  Administrators who restart jobs in the activity monitor and then immediately delete the original jobs.  This issue cannot be prevented as in many cases the deletion of the original job takes place in the jobs database before the job is completely restarted. Similarly, this will also occur if the administrator suspends a job, cancels the job, then deletes it immediately.

Item 3.  A bug in NetBackup that will manifest itself in the activity monitor by jobs in a Waiting for Retry state that have exited reporting status code 50.  This is NetBackup bug number ET2222921

Item 4.  Abrupt termination of the bpjobd and/or nbjm processes on the master server, without recycling the media server processes.  An example is using a kill -9 (UNIX), or Windows Task Manager to kill these processes or abruptly powering off the box. 

 


Cause



For Item 1:
Upon restarting of the NetBackup processes, typically many jobs will automatically resume or be restarted.  When this occurs, the following type scenario may occur:
 
00:00:03: JOBRESUME (jobid 1234)
 
00:00:07: JOBKILL  (jobid 1234)

00:02:05: JOBRESTART  (jobid 1234)
          - Try file not yet updated with 'RESTARTING job id 1234 as 5678'
 
 
00:02:07: JOBCLEAN (jobid 1234)
          - All files deleted (try file is deleted from the netbackup\db\jobs directory)
 
00:02:11: Added the message 'RESTARTING job id 1234 as 5678' to the jobid1234 netbackup\db\jobs try file which was deleted 4 seconds ago.  Since the file doesn't exist, a new file is created. This results in either a ghost job or a status 50. 
 
For Item 2:
Job attribute updates are sent to bpjobd from nbjm every 10 seconds.  In the case where a job is restarted and then immediately deleted, the following scenario may occur:
 
A.  Jobs are restarted in the GUI.  (operator action)
   --- During this time, the restart travels through nbpem to nbjm (up to a 10 second wait here) to bpjobd.
 
B.  The old jobs are deleted in the GUI (operator action)
   --- When a job is deleted, it’s an immediate call made directly to bpjobd.
 
If jobs are restarted, and immediately deleted from the interface by operators, it’s possible that the deletion makes it to bpjobd before the restart does.  So the trylog file is deleted, then recreated with restarting <ctime> Old_jobid as New_jobid in the try log files.

This is a current design limitation of the product.

For Item 3:
This is caused by a garbage packet being sent from bptm to bpjobd after sending the start backup message.

For Item 4:
This issue is caused by the application being unaware that a job is still active on a media server when abrupt termination of nbjm or bpjobd processes occur on the master server without recycling the media server processes.    The master server periodically cleaned the jobs database, and if a job is not registered as active in nbjm, the job information will be removed.

When the forgotten job cleanup is performed on jobs, the nbjm log file will show messages similar to the following at DebugLevel 1:

02/22/11 10:49:55.602 [Debug] NB 51216 nbjm 117 PID:123 TID:3660 File ID:117 [No context] 1 [JobManager_i::doForgottenJobCleanup]  job has been forgotten, perform cleanup, jobid=12345  

 


Solution



Ghost job Cleanup

If ghost jobs are seen, Administrators must manually remove the jobs by following the directions below:

If you have a large number of jobs to clean
• Shutdown NetBackup Master AND Media Server processes
• Verify no active NetBackup processes on the Master server
• Verify no active NetBackup processes on all Media servers
• Remove all files from the jobs directory, except “jobid”
          Windows: <install_path>\NetBackup\db\jobs
          UNIX: /usr/openv/netbackup/db/jobs
• Do not remove the “jobid” file, unless you want your job numbers to start from 1
• This procedure will cause job history to be lost in Activity Monitor.

Else, if you have few jobs to clean you may consider
• Shutdown NetBackup Master AND Media Server processes
• Verify no active NetBackup processes on the Master server
* Verify no active NetBackup processes on all Media servers
• Run bpjobd -r <jobid> for each ghost jobid.

Note:       If NetBackup Media server processes are NOT terminated,
and Media server processes for a Job remain active
and either of these workarounds are implemented,
then there is a window of opportunity for new ghost jobs to appear after restarting the NetBackup Master server.
This falls under the scenario proposed in Item 4 at the top of this document.

 

FORMAL RESOLUTION:

For Item 1:  This issue is addressed in the NetBackup 7.1.0.2 maintenance update.

For Item 2 The solution is for operators to wait a minute between restarting jobs and deletion of the original jobids.  

For Item 3:  This issue is addressed in NetBackup 7.1.0.2.

For Item 4:   Administrators should recycle the media servers when a master server is recycled.

Information on obtaining the NetBackup 7.1.0.2 Maintenance Update is available using the link provided below.
7.1.0.2 fixes are also available in the 2.0.1 patch for NetBackup 5200/5220 appliances, available at this location.


Supplemental Materials

SourceETrack
Value2155673
Description

Activity Monitor shows jobs terminated with status 50...


SourceETrack
Value2222921
Description

Jobs started on 12/7 are still waiting for retry even though it says it ended on 12/13 with status 50.


SourceETrack
Value2424555
Description

7.1 CUMULATIVE EEB: Ghost jobs in 7.1 that have low jobids. Some of them contain garbage from bpbrm, others have nothing.


SourceETrack
Value2487325
Description

7.1.0.1 CUMULATIVE EEBGhost jobs in 7.1.0.1 that have low jobids. Some of them contain garbage from bpbrm, others have nothing.


SourceError Code
Value50
Description

client process aborted




Article URL http://www.symantec.com/docs/TECH146990


Terms of use for this information are found in Legal Notices