Video Screencast Help

RMAN Backup failing after backup "begin writing"

Created: 04 Mar 2013 • Updated: 04 Mar 2013 | 8 comments
Varma Chiluvuri's picture
This issue has been solved. See solution.

03/04/2013 05:02:50 - Info nbjm (pid=28573784) starting backup job (jobid=1002560) for client sapm6pdrv-nb, policy ORA_RMAN_GRV_M6P, schedule Default-Application-Backup
03/04/2013 05:02:50 - Info nbjm (pid=28573784) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=1002560, request id:{AC5A40B6-84B2-11E2-B1C5-02AA35650000})
03/04/2013 05:02:50 - requesting resource nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:50 - requesting resource sun107-nb.NBU_CLIENT.MAXJOBS.sapm6pdrv-nb
03/04/2013 05:02:50 - requesting resource sun107-nb.NBU_POLICY.MAXJOBS.ORA_RMAN_GRV_M6P
03/04/2013 05:02:50 - Waiting for scan drive stop STK.T10000A.015, Media server: nbu-media-grv-nb
03/04/2013 05:02:51 - granted resource  sun107-nb.NBU_CLIENT.MAXJOBS.sapm6pdrv-nb
03/04/2013 05:02:51 - granted resource  sun107-nb.NBU_POLICY.MAXJOBS.ORA_RMAN_GRV_M6P
03/04/2013 05:02:51 - granted resource  I00184
03/04/2013 05:02:51 - granted resource  STK.T10000A.015
03/04/2013 05:02:51 - granted resource  nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:52 - estimated 0 kbytes needed
03/04/2013 05:02:52 - Info nbjm (pid=28573784) started backup (backupid=sapm6pdrv-nb_1362391372) job for client sapm6pdrv-nb, policy ORA_RMAN_GRV_M6P, schedule Default-Application-Backup on storage unit nbu-media-grv-nb-hcart3-robot-tld-3
03/04/2013 05:02:55 - mounting I00184
03/04/2013 05:02:56 - Info bpbrm (pid=19923062) sapm6pdrv-nb is the host to backup data from
03/04/2013 05:02:56 - Info bpbrm (pid=19923062) telling media manager to start backup on client
03/04/2013 05:02:56 - Info bptm (pid=11206736) using 262144 data buffer size
03/04/2013 05:02:56 - Info bptm (pid=11206736) using 64 data buffers
03/04/2013 05:02:57 - Info bpbrm (pid=19923062) spawning a brm child process
03/04/2013 05:02:57 - Info bpbrm (pid=19923062) child pid: 16842958
03/04/2013 05:02:58 - Info bpbrm (pid=19923062) sending bpsched msg: CONNECTING TO CLIENT FOR sapm6pdrv-nb_1362391372
03/04/2013 05:02:58 - Info bpbrm (pid=19923062) listening for client connection
03/04/2013 05:02:58 - connecting
03/04/2013 05:03:05 - Info bpbrm (pid=19923062) INF - Client read timeout = 3000
03/04/2013 05:03:11 - Info bpbrm (pid=19923062) accepted connection from client
03/04/2013 05:03:11 - connected; connect time: 0:00:00
03/04/2013 05:03:30 - mounted I00184; mount time: 0:00:35
03/04/2013 05:03:31 - positioning I00184 to file 113
03/04/2013 05:03:42 - positioned I00184; position time: 0:00:11
03/04/2013 05:03:42 - begin writing
03/04/2013 05:18:16 - Error bpbrm (pid=16842958) client sapm6pdrv-nb EXIT STATUS = 6: the backup failed to back up the requested files
03/04/2013 05:18:16 - Info bpbkar (pid=0) done. status: 6
03/04/2013 05:18:16 - Info bpbrm (pid=19923062) sending message to media manager: STOP BACKUP sapm6pdrv-nb_1362391372
03/04/2013 05:18:17 - Info bpbrm (pid=19923062) media manager for backup id sapm6pdrv-nb_1362391372 exited with status 150: termination requested by administrator

03/04/2013 05:18:17 - end writing; write time: 0:14:35
the backup failed to back up the requested files  (6)
 

Operating Systems:

Comments 8 CommentsJump to latest comment

Varma Chiluvuri's picture

The OS backup is completing successfully but Database RMAN backup is failing after few mins it starts writing, please provide me a solution and let me know if you need more details.

Netbackup Master Server Version : 7.5.0.4

Netbackup Media  Server Version : 7.5.0.4

Netbackup Client   Server Version : 7.0

 

Thanks & Regards

Varma

 

Marianne's picture

Questions:

  1. Is this a new or existing Oracle client?
  2. Has Oracle backups worked previously?
  3. Which steps were followed on Client to configure and link NBU and RMAN?
  4. Is NBU Policy type Oracle or SAP? (I'm wondering about bpbkar process in job details)
  5. Any reason why client is not on NBU 7.5.x as well?
  6. Seems you are using backup network for client. What is CLIENT_NAME in bp.conf on client? Have you hard-coded backup network name in NB_ORA_CLIENT in RMAN script?

You need the following logs to troubleshoot:

On Oracle client: dbclient (if log folder does not exist, create it and remember to chmod 777) as well as the RMAN output file.

On Media server: bptm and bpbrm

Please rename logs to reflect process name (e.g. dbclient.txt) and post as File attachments.

Please also check  Client Connect and Client Read Timeouts on the media server - big databases normally need increased timeouts (e.g. 1800).

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Varma Chiluvuri's picture

Answers:

1. Existing oracle(11g) client

2. Yes, as this is DR backup we run this backup when required.

3. The RMAN backup policy is same as other policies and this is an old policy.

4. Oracle

5. No, we are planning to upgrade client soon.

6. # cat bp.conf
SERVER = sun107-nb
SERVER = nbu-media-grv-nb
SERVER = sun140-nb
CLIENT_NAME = sapm6pdrv-nb

Attached dbclient log from the client and bptm, bpbrm logs from the media server.

Client Connect and Client Read Timeouts on the media server

CLIENT_READ_TIMEOUT = 3000
CLIENT_CONNECT_TIMEOUT = 600

AttachmentSize
dbclient.txt 12.36 KB
bpbrm_mediaserver.txt 489.19 KB
bptm_mediaserver.txt 566.18 KB

Thanks & Regards

Varma

 

Marianne's picture

Comms issue between client and master server.
Client is trying to connect to bprd (via vnetd) on the master.
Client received no response from master:

 

08:05:58.341 [19005588] <2> logconnections: BPRD CONNECT FROM 10.48.184.74.55164 TO 100.6.1.107.13724
08:21:33.399 [18612362] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/18612362.0.1362402358
08:21:33.400 [18612362] <32> serverResponse: ERR - could not read from comm file </usr/openv/netbackup/logs/user_ops/dbext/logs/18612362.0.1362402358>
08:21:33.400 [18612362] <16> CreateNewImage: ERR - serverResponse() failed

Please extract bprd log entries on master server between 08:00 and 08:21.

We need to see if backup request was received by master server from Client IP address 10.48.184.74 and how the master server interpreted this connection request.

According to Client config the master server is sun107-nb with IP address 100.6.1.107.
Is this correct?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Varma Chiluvuri's picture

Attached the master server BPRD log between 08:00 and 08:22

Yes, you are correct master server sun107-nb IP is 100.6.1.107 and client IP is 10.48.184.74

AttachmentSize
bprd_masterserver.txt 716.65 KB

Thanks & Regards

Varma

 

Varma Chiluvuri's picture

@Marianne I found the below string in the master server BPRD log

 

ogconnections: BPCD CONNECT FROM 100.6.1.107.35229 TO 10.48.184.74.13724 fd = 5
08:06:14.463 [42860712] <2> vnet_pbxConnect: ../../libvlibs/vnet_pbx.c.666: pbxSetAddrEx/pbxConnectEx return error 73:Connection reset by peer
08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2034] vnet_pbxConnect() failed 18 0x12
08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2035] save_errno 73 0x49
08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2036] use_vnetd 1 0x1
08:06:14.463 [42860712] <8> do_pbx_service: [vnet_connect.c:2037] cr->vcr_service vnetd
08:06:14.463 [42860712] <8> async_connect: [vnet_connect.c:1630] do_service failed 18 0x12
08:06:14.504 [42860712] <8>

Thanks & Regards

Varma

 

Marianne's picture

The above tells us that the master is trying to connect to bpcd on the client (via vnetd).

We need to see incoming connection FROM  10.48.184.74.55164 TO 100.6.1.107.13724.

You can also check connectivity from Client to master as follows:

bpclntcmd -pn

Client should get response back from master server with client name that is known by master server.
Evidence of this connection request can also be seen in master's bprd log.

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
Varma Chiluvuri's picture

Thanks Marianne, I have verified the communication in the client and it was not usual...

# ./bpclntcmd -pn
expecting response from server sun107-nb
sapm6pdrv *NULL* 10.48.184.74 40426

Afterwards I found that there are two entries pointing to the client in the master server.

10.48.184.74    sapm6pdrv

14.4.2.74       sapm6pdrv-nb

I have modified the entry as below because master server can't communicate to 14 series IP because sapm6pdrv is in different network.

10.48.184.74    sapm6pdrv       sapm6pdrv-nb

Now the communication is good and the backup completed successfully.

# ./bpclntcmd -pn
expecting response from server sun107-nb
sapm6pdrv sapm6pdrv-nb 10.48.184.74 54313

Thank you for quick response guiding me in the right direction.

 

Thanks & Regards

Varma