BUG REPORT: In NetBackup 7.1, ghost job may occur that can't be deleted, and that have a state of "waiting for retry" and an "exit status 50".

Article:TECH163191  |  Created: 2011-06-24  |  Updated: 2011-10-31  |  Article URL http://www.symantec.com/docs/TECH163191
Article Type
Technical Solution


Environment

Issue



On 7.1 Windows Media servers, that are running multiplexed backup jobs, try log messages for child mpx jobs are sent using the first mpx job's (within the mpx group) job id.  This can result in ghost updates if the first mpx job within a group completes early and later mpx jobs continue to join its mpx group.
 

This behavior can be identified by reviewing the <install_path>\VERITAS\NetBackup\db\jobs\trylogs\<jobid>.t file on the master server.  If this occurs, there will be repeated updates in the file stating "is the host to backup from", similar to the following:

LOG 1306842023 4 bpbrm 1031876 CLIENT_A is the host to backup data from
LOG 1306842169 4 bpbrm 1032796 CLIENT_A is the host to backup data from
LOG 1306842238 4 bpbrm 1031408 CLIENT_A is the host to backup data from
LOG 1306842291 4 bpbrm 1035848 CLIENT_A is the host to backup data from
LOG 1306842364 4 bpbrm 1035568 CLIENT_A is the host to backup data from
LOG 1306842746 4 bpbrm 1028044 CLIENT_A is the host to backup data from

 


Error



Status 50


Environment



7.1 Windows Media servers running MPX jobs.


Cause



This behavior can be confirmed by reviewing the nbjm, bpjobd and bpbrm log files.

The bpbrm log file will show the first backup job in the mpx group starting (note the contents of these messages have been edited to try to make them easier to read).  Example using backupid CLIENT_A_1306796447,  jobid 164429, job groupid 164344:

The first backup job starts under pid 931908.931912:

19:00:47.711 [931908.931912] <2> logparams: -backup -S MASTER_SERVER -c CLIENT_A -b CLIENT_A_1306796447 -jobid 164429 -jobgrpid 164344
 

All subsequent jobs reference the first jobs, backupid (CLIENT_A_1306796447),  jobid (164429) and  job groupid (164344) in the backup start log messages.

Next backup starts, backup id is  CLIENT_C_1306796454, jobid is 164440, and job groupid is 164339:

19:01:03.436 [932444.932840] <2> logparams: -backup -S MASTER_SERVER -c CLIENT_A -b CLIENT_NAME_A_1306796447 -jobid 164429 -jobgrpid 164344 -cl POLICY_NAME -b CLIENT_C_1306796454 -jobid 164440 -jobgrpid 164339

Next job starts, backup id is CLIENT_D_1306796490, jobid is164449, and job group id is164386 :
19:01:37.008 [934132.934136] <2> logparams: -backup -S MASTER_SERVER -c CLIENT_NAME_A -b CLIENT_A_1306796447 -jobid 164429 -jobgrpid 164344 -cl POLICY_NAME -b CLIENT_D_1306796490 -jobid 164449 -jobgrpid 164386
 

Etc.

Periodically, the bpjobd process initiates forgotten job cleanup.  All active jobid's within bpjobd will be compared with the active jobs in nbjm.  If a jobid is not known as an active job by nbjm, the job will be cleaned up by bpjobd.

This process triggers cleanup of  jobid 164429:

[JobManager_i::doForgottenJobCleanup]  job has been forgotten, perform cleanup, jobid=164429

After the job is removed by the forgotten job cleanup process the  try log file will no longer exist in the <install_path>\VERITAS\NetBackup\db\jobs\trylogs

Since all the backup jobs associated with the MPX group are started referencing the jobid 164429, when subsequent updates are sent by bpbrm  bpjobd enters into "create_new_job" for this 'old' jobid:



07:40:23.130 [536392.535680] <2> create_new_job: Allocating new Active job (164429)
07:40:23.130 [536392.535680] <2> create_new_job: Adding provider(0) for job(164429) socket (980)
07:40:23.130 [536392.535680] <2> process_active_job: Begin to append (JOBTRYFILE) job (164429)

The result is updates to the trylog file for a job finished long ago:
07:40:23.130 [536392.535680] <2> process_active_job: LOG 1306842023 4 bpbrm 1031876 CLIENT_A is the host to backup data from
 

 

 

 

 

 


Solution



A binary is available for 7.1 media servers under ET 2399149.

 

Formal Resolution:
Symantec Corporation has acknowledged that the above mentioned issue (Etrack 2399149) is present in the current version(s) of the product(s) mentioned at the end of this article. Symantec Corporation is committed to product quality and satisfied customers.

This issue is tentatively scheduled to be addressed in the following release:
 

  •  NetBackup 7.1.0.2

As fixes are released, please visit the following link for download and readme information:http://www.symantec.com/enterprise/support/overview.jsp?pid=15143


Please note that Symantec Corporation reserves the right to remove any fix from the targeted release if it does not pass quality assurance tests or introduces new risks to overall code stability. Symantec's plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.

Supplemental Materials

              ghost jobs in the activity monitor - appears to be related to bpbrm updates to bpjobd - ET2380076 binaries are installed.

SourceETrack
Value2399149
Description

  



Article URL http://www.symantec.com/docs/TECH163191


Terms of use for this information are found in Legal Notices