Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Master Server performance problem

Created: 10 Jun 2013 • Updated: 31 Jul 2013 | 12 comments
This issue has been solved. See solution.

Hello!

need some help to understand a performance problem on my Master Server.

My Master Server manages more than 350k bacukp images and more than 1000 clients.

root@michelangelo #  /usr/openv/netbackup/bin/admincmd/bpimagelist -idonly -d "01/01/1970 00:00:00" | wc -l
  374437

Yesterday I had to restart it because schedule jobs didn't start for any client, the "top" command showed that CPU had 0% idle, allocated by 71 bpdbm process running. After the restart the schedule jobs started, but I investigated to identify the problem.

I noted that the Automatic jobs started for a particular Oracle client were frozen during the last step "Validating Image", after that all "Default-Application" jobs completed:

Info bpbrm(pid=18471) validating image for client

This is a particular client, having a huge Oracle DB and archivelog scheduled every 30 minutes.

For each bpbrm process in the media server there was a corrisponding bpdbm process in the Master Server looking for something on the backup image catalog. So I try to count the backup images  I realized that I more than 90k images!!

 

root@michelangelo # /usr/openv/netbackup/bin/admincmd/bpimagelist -client renetta.unix.t-systems.it -idonly -d "01/01/1970 00:00:00" | wc -l
  90918

So, I try to rename the client in the bp.conf and all its policies and I noted that the "validate image" ended in few seconds: the new client has few backup images.

 

Some usefull information on my Environment:

Master Server version: 7.1.0.4

Master Server OS: Solaris 10

Master Server RAM: 32gb

Master Server CPUs: 40

Media Server version: 7.1.0.4

 

The question is: how can I improve the backup image catalog performance? Is it possible that the image cleanup does not working fine? the backup images for that client should expire in 6 months. Have I to consider this problem during the upgrade to the 7.5.0.5 version that we are planning (with the backup image metadata migration)?

Thank you very much!

Simone

 

 

Operating Systems:

Comments 12 CommentsJump to latest comment

Nicolai's picture

Try runnign bpimage -cleanup -allclients

This will intiate the catalgo clean up process manual.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Nicolai's picture

See also "Catalog maintenance and performance optimization" in the Netbackup Admin manual vol 1:

http://www.symantec.com/docs/DOC5334

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

cimo's picture

Hello,

Thank you for your answer.

I noted that during "image cleanup" process there are a lot of error in bpdbm log file from the cited client renetta.unix.t-systems.it like this:

 

09:15:50.029 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)
AND (BackupID = 'renetta.unix.t-systems.it_1364705777') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.031 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp  WHERE (MasterServerKey = 1000002) AND (BackupID = '
renetta.unix.t-systems.it_1364705777') AND (ClientType = 4) AND (OperationID = 3) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.337 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)
AND (BackupID = 'renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState
09:15:50.339 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp  WHERE (MasterServerKey = 1000002) AND (BackupID = '
renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 3) (rc=100) ErrMsg , ErrCode 0, SqlState

 

Maybe is there some kind of corruption?

Simone

revaroo's picture

You could try a bpdbm -consistency to check the integratory of your backups, otherwise consider upgrading.

mph999's picture

With that mant bpdbm processes all related to DB activityies for the same client, looks like there is some issue that needs addressing.

You are at 7.1.0.4 , so the header files are still header files (they get imported into NBDB at 7.5).  bpdbm -consistency 2 will check the images DB, the header files being part of this.

It won't (well at least I dont believe it does) check the NBDB ...  which is where these errors are coming from ...

09:15:50.337 [16352] <16> executeSql: Error executing UPDATE DBM_MAIN.DBM_ImageChangeLog SET CreatedDateTime = current utc timestamp , OperationID = 3 WHERE (MasterServerKey = 1000002)

AND (BackupID = 'renetta.unix.t-systems.it_1364705762') AND (ClientType = 4) AND (OperationID = 2) (rc=100) ErrMsg , ErrCode 0, SqlState

I'm not sure how vital DBM_MAIN.DBM_ImageChangeLog is - but are there other issues in the DB for this client that cause the hanging.

I can only suggest that it needs investigating.

I partly agree with revaroo, sometimes an upgrade to a later version is good - but as there are mbig DB changes in 7.5 I would not upgrade when there are potential DB errors.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
revaroo's picture

Indeed. Ar these SQL errors a cause or a sypmtom of the bpdbm processes?

I'd recommend suspending scheduling, firing a few manual jobs and see if these messages continue. If they don't THEN consider upgrading.

cimo's picture

I will open a case to the support.

I will update you asap...

 

Thank you for the support.

Simone

 

cimo's picture

Hello!

The support is still analyzing the bpdbm log file...  In the meanwhile I tried the upgrade on my DR site, where I had the same errors in bpdbm log files.

Upgrade was successful, and that error disappear. I'm pretty sure to upgrade my production site.

Some further hint?

 

Thank you very much

Simone

huanglao2002's picture

Hi Cimi

Can you try to check oracle backup scripts,is it the backup command contain _%t  options? this option avoid large catalog search after oracle backup complete.

 

Oracle Backup Format

Ensure that the format specified for all RMAN backup piece names, except for autobackups of the control file, ends with a _%t as documented in the NetBackup for Oracle manual. Failure to add the timestamp results in a series of extended queries that can cause significant performance degredation. These Oracle best practices and others can be found in the article below:

http://www.symantec.com/docs/TECH49868

cimo's picture

It is correct:

 

BACKUP
    $BACKUP_TYPE
    FILESPERSET 1
    FORMAT 'bk-hot-$ORACLE_SID-$BCK_TYPE-$rman_time-s%s_p%p_t%t'
    database;

 

Thank you.

Simone

cimo's picture

Support close the case, no corruption was detected.

 

As regard the Oracle client backup, after substitute the client name with a new one backup was successful. in the meanwhile a long cleanup delete a long number of old backup image for that client.

 

Thank you.

SOLUTION
Omar Villa's picture

Try moving thr image the log complaints in another folder and run a catalog backup also what support did? Do they ran NBCC?

Omar Villa

Netbackup Expert

Twiter: @omarvillaNBU