Recommended Oracle RMAN backup piece name format for efficient backup, restore, and crosscheck

Article:TECH20615  |  Created: 2002-01-11  |  Updated: 2014-07-15  |  Article URL http://www.symantec.com/docs/TECH20615
Article Type
Technical Solution


Subject

Issue



What RMAN FORMAT should be used for successful and efficient backup using NetBackup for Oracle?

 

The most common problem is when there are delays performing RMAN operations.  The delay can be from a few minutes to over an hour in extreme cases.  Several symptoms are visible to the NetBackup administrator and the DBA.

 

A) The Oracle Application backup job completes with a status 0 in the Activity Monitor and shows good throughput, but there is a long delay before the next Application job starts or before the Automatic backup job exits.  The bpbrm and bptm processes on the media server will have completed before the status 0, but the dbclient processes are still running; even if this was the last backupset piece in the overall backup of the Oracle database.

 

B) There may be a long delay between when RMAN initiates a restore and when the job appears in the NetBackup Activity Monitor.

 

C) RMAN crosscheck or delete expired operations progress very slowly.

 

D) In rare instances the delay may be long enough to cause an application timeout.

Less common problems are detailed in the related articles.


 


Error



For backups, the Job Details typically show that the job completed quickly with status 0, but the next job did not queue within a minute as expected.

 

For the first job:

 

07/15/2013 11:49:18 - end writing; write time: 0:00:10

the requested operation was successfully completed (0)

 

For the next job:

 

07/15/2013 12:44:56 - Info nbjm (pid=14599) starting backup job ...

 

 

The comm file (18700.0.1373989759) on the client confirms that the server status was updated and the job is complete.  This file is located in the /usr/openv/netbackup/logs/user_ops/dbext/logs.

 

11:49:18 INF - Server status = 0

11:49:18 INF - Backup by oracle on client myclient using policy mypolicy, sched mysched: the requested operation was successfully completed

 

The dbclient debug log shows that the job is complete from both the dbclient and server perspective and that sbtclose2 processing returned control to Oracle.

 

11:49:18.684 [18700] <2> sbtclose2: INF - entering

11:49:18.684 [18700] <2> int_CloseImage: INF - Backup - closing <SID_160729794_20130715.ctl>

...snip...

11:49:18.697 [18700] <4> closeApi: INF - EXIT STATUS 0: the requested operation was successfully completed

...snip...

11:49:18.900 [18700] <4> closeApi: INF - server EXIT STATUS = 0: the requested operation was successfully completed

...snip...

11:49:18.901 [18700] <2> sbtclose2: INF - leaving

 

But the subsequent sbtinfo2 request from Oracle to lookup the media ID to which the backup was written is either still active or in this case took nearly an hour to complete.  Oracle does not consider the backup successful until this lookup completes.

 

11:49:18.901 [18700] <2> sbtinfo2: INF - entering

11:49:18.901 [18700] <2> sbtinfo2: INF - requesting image info for <SID_160729794_20130715.ctl>

...snip...

11:49:18.902 [18700] <2> int_logDateRange: INF - Start Time = 12/26/95 00:00:00

11:49:18.902 [18700] <2> int_logDateRange: INF - End Time = 07/16/13 15:49:19

...snip...

11:49:18.905 [18700] <4> BuildBprdRequest: request_string=<7.1 myclient myclient *NULL* 4 819936000 1373989759 /SID_160729794_20130715.ctl>

...snip...

11:49:19.519 [18700] <2> logconnections: BPRD CONNECT FROM 10.x.x.3.39654 TO 10.x.x.1.1556 fd = 31

...long delay here...

12:44:42.584 [18700] <4> dbc_GetMediaListByName:        Media ID : </NBU/DSU/myclient_1373989763_C1_F1>

...snip...

12:44:42.597 [18700] <2> sbtinfo2: INF - leaving

 

 

The bprd debug log shows the inbound request is forwarded to bpdbm, but did not receive a response for nearly an hour.

 

11:49:19.531 [16623] <2> logconnections: BPRD ACCEPT FROM 10.x.x.3.39654 TO 10.x.x.1.1556 fd = 31

...snip...

11:49:19.532 [16623] <2> process_request: command C_MEDIA_LIST_BY_FILE_3_2 (67) received

11:49:19.532 [16623] <2> get_image_by_file: client = myclient

11:49:19.532 [16623] <2> get_image_by_file: pathname = /SID_160729794_20130715.ctl

11:49:19.532 [16623] <2> get_image_by_file: starttime = 819936000

11:49:19.532 [16623] <2> get_image_by_file: endtime = 1373989759

11:49:19.532 [16623] <2> get_image_by_file: client_type = 4

...snip...

11:49:19.534 [16623] <2> logconnections: BPDBM CONNECT FROM 10.x.x.1.55628 TO 10.x.x.1.1556 fd = 6

...long delay here...

12:44:42.298 [16623] <2> get_image_by_file: Sent to client @aaacX 1373903341 1376581741 myclient_1373903341 PDW

12:44:42.299 [16623] <2> process_request: EXIT STATUS 0

 

 

Older versions of NetBackup and Oracle, may fail the lookup completely, possibly with little delay.  In those instances the RMAN output may show messages similar to these.

 

ORA-27016, 00000, "skgfcls: sbtinfo returned error"

ORA-27192: skgfcls: sbtclose2 returned error - failed to close file

 

ORA-19513: failed to identify sequential file

ORA-27206: requested file not found in media management catalog

 

The dbclient debug log may show a search range that is outside of the time at which the backup occurred.  In this case, the backup took place on August 13, but the start and end time are for June 10-12.

 

17:10:54 [8663] <4> get_bfs_date_range: Start Time = 06/10/99 07:10:21

17:10:54 [8663] <4> get_bfs_date_range: End Time = 06/12/99 07:10:21

...snip...

17:10:54 [8663] <4> dbc_GetMediaListByName: Request String = <3.4 myclient myclient *NULL* 4 929013021 929185821 /i0h_CATAPRD24055367827021>

...snip...

17:10:55 [8663] <16> sbtinfo: No media found

 


Environment



Any NetBackup version

Any platform

 


Cause



Oracle does not consider a backup piece completely saved until the sbtbackup/sbtwrite/sbtclose/sbtinfo sequence is done.

 

In this case the site was not using the recommended RMAN format for the name of the backupset piece.  Because there isn't a timestamp to key off, dbclient can't request a narrow search of the image directory for the piece, and is waiting for bpdbm to search all images as shown by the 'Start Time' above; 12/26/95.  If the client has many images and the master server is under significant load the search can take a long time.

 

The same delays shown above may be observed during sbtinfo requests associated with RMAN crosscheck, delete expired, and restore operations.  For crosscheck and deleted expired operations, this delays are cumulative across all the pieces being checked or deleted.

 

Once a sbtinfo lookup has completed, the next sbtbackup/sbtrestore/sbtclose/sbtremove/sbtinfo should occur without delay.  Delays between SBT API calls indicate that NetBackup is waiting for Oracle.  Delays within SBT API calls indicate that Oracle is waiting for NetBackup.

 

Note: If the RMAN operation is using multiple channels, it will not issue a sbtend for any channel until the prior operations on all channels have completed.

 


Solution



Do not include any space characters in the format and end the format with a '_%t'.

 

The presence of the '_%t' allows NetBackup to search only the images created within +/- 24 hours, instead of the entire catalog for the client.

 

If there is anything in the piece name format after the '_%t', the RMAN FORMAT syntax should be modified so that it is placed elsewhere in the piece name and '_%t' is at the end of the piece name.  E.g.

 

In the backup script:

 

  BACKUP FORMAT 'df_%s_%p_%t' ... DATABASE;

  BACKUP FORMAT 'al_%s_%p_%t' ... ARCHIVELOG ...;

  BACKUP FORMAT 'cf_%s_%p_%t' ... CONTROLFILE;

 

In the RMAN persistent configuration for the target database:

 

  CONFIGURE CHANNEL DEVICE TYPE sbt FORMAT 'bk_%s_%p_%t';

 

In the NetBackup template wizard:

 

  On the Backup Options panel, set the 'Backup file name format'

  bk_%s_%p_%t

 

The '_%t' is still useful and should continue to be used after the master server is upgraded to NetBackup 7.6 and making use of the quick lookup table.

 

Note: See the related articles for less common problem symptoms when the recommended format is not used.

 




Legacy ID



248105


Article URL http://www.symantec.com/docs/TECH20615


Terms of use for this information are found in Legal Notices