Vault generates zombie processes
Sometimes NetBackup vault a number of child processes that appear to be zombie (defunct) ones.
I wonder whether anyone of you faced this issue before?
# ptree 20904
20904 bpbrmvlt -bt 1238644819 -jobid 10601298 -jobgrpid 10601298 -masterversion 60000
20908 /usr/openv/netbackup/bin/vltrun 1/lib2-vault/vlt-mc-bc-daily-dup -jobid 1060129
2415 <defunct>
20982 <defunct>
26081 <defunct>
2581 <defunct>
26887 <defunct>
21137 /usr/openv/netbackup/bin/vltrun 1/lib2-vault/vlt-mc-bc-daily-dup -jobid 1060129
4929 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
14917 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv-hcart2-rob
22933 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
23319 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
28582 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv5-hcart2-ro
29543 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv4-hcart2-ro
6166 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv5-hcart2-ro
11587 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit mediasrv2-hcart2-robot
14737 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv4-hcart2-ro
17705 /usr/openv/netbackup/bin/admincmd/bpduplicate -dstunit dr-mediasrv-hcart2-rob
Comments
What pid belongs to the defunct processes
Looks like you are running a NetBackup 6.0 Master server, are you running any maintenance packs?
You might want to try to determine what process is leaving the defunct processes. My bet would be bpduplicate or the less unlikely vtlrun process.
Create the log directories, bpduplicate, vault and admin under the /usr/openv/netbackup/logs directory.
Increase verbose logging level to 5 in the bp.conf and restart NBU daemons.
Once you have some defunct processes, try greping for these process id's in those 3 log directories.
This should atleast start you on the right path.
Yes, you are right. They
Yes, you are right. They really appear to be bpduplicate processes that abnormally terminated with status code 50. According to bpduplicate log the termination was due to ltid failure on the media server.
I have also noticed that in spite of a vault job sometimes appears to be terminated with status code 150 in Activity Monitor, it is still goes on. It looks like the second try of vault job.
I wonder what causes the fake termination of vault.
bpduplicate
I would download the Vault patch and read the readme file. This file will contain ETRACK's for identified issue, if your issue is listed then the direction you want to take is to upgrade your Master and Media servers.
Otherwise you may want to open up a support ticket with SYMC and provide your findings. Might be an issue someone else has seen in which SYMC support may be able to provide an updated binary.
Would you like to reply?
Login or Register to post your comment.