Video Screencast Help

Oracle RMAN backup failures status 50

Created: 26 Oct 2012

Environment: I've got a dedicated RHEL 5 linux master, with a RHEL 5 media server both running 7.1.0.4 support about 20 large oracle RAC servers (mix of prod and dev). Clients are running 6.5.6.

Problem: Archive log and block level backup jobs fail intermittently with status 50. These can happen between 4-16 times a day, without much apparent regard for client or time of day.

I've been looking into this for several days, and read some similar posts but nothing that really hits exactly what's happening for me. I've tried checking logs in various place and following the bread crumbs without success. It appears that at this moment the best I can do is from a dbclient log

04:06:30.936 [4082] <2> int_StartJob: INF - copyID: 1 - 1351238758
04:06:30.957 [4082] <2> int_WriteData: INF - writing buffer # 1 of size 262144
04:06:30.957 [4082] <16> writeToServer: ERR - send() to server on socket failed: Broken pipe (32)
04:06:30.957 [4082] <16> dbc_put: ERR - failed sending data to server
 

Thing is, my bphdb is empty even though it was created before last night's failures.

The job details themselves (picking 1 job out of 4 from last night):

10/26/2012 04:05:55 - Info nbjm (pid=9519) starting backup job (jobid=7354691) for client client.fqdn.com, policy ORACLE-BACKUP-POLICY-1, schedule Default-Application-Backup
10/26/2012 04:05:55 - Info nbjm (pid=9519) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=7354691, request id:{F7F52164-1F43-11E2-AEF8-A4CCDB7997A6})
10/26/2012 04:05:55 - requesting resource DD-SPPDUPFI13x-ORACLE-0
10/26/2012 04:05:55 - requesting resource masterserver.fqdn.com.NBU_CLIENT.MAXJOBS.client.fqdn.com
10/26/2012 04:05:55 - requesting resource masterserver.fqdn.com.NBU_POLICY.MAXJOBS.ORACLE-BACKUP-POLICY-1
10/26/2012 04:05:57 - granted resource  masterserver.fqdn.com.NBU_CLIENT.MAXJOBS.client.fqdn.com
10/26/2012 04:05:57 - granted resource  masterserver.fqdn.com.NBU_POLICY.MAXJOBS.ORACLE-BACKUP-POLICY-1
10/26/2012 04:05:57 - granted resource  MediaID=@aaae2;Path=/path/to/nfs/STU/Oracle;MediaServer=mediaserver.fqdn.com
10/26/2012 04:05:57 - granted resource  DD-STU-ORACLE-17
10/26/2012 04:05:58 - Info bpbrm (pid=30544) client.fqdn.com is the host to backup data from
10/26/2012 04:05:58 - Info bpbrm (pid=30544) reading file list from client
10/26/2012 04:05:58 - estimated 0 kbytes needed
10/26/2012 04:05:58 - Info nbjm (pid=9519) started backup job for client client.fqdn.com, policy ORACLE-BACKUP-POLICY-1, schedule Default-Application-Backup on storage unit DD-STU-ORACLE-17
10/26/2012 04:05:58 - started process bpbrm (pid=30544)
10/26/2012 04:05:58 - connecting
10/26/2012 04:05:59 - Info bpbrm (pid=30544) listening for client connection
10/26/2012 04:05:59 - Info bpbrm (pid=30544) INF - Client read timeout = 3600
10/26/2012 04:06:05 - Info bpbrm (pid=30544) accepted connection from client
10/26/2012 04:06:05 - Info bphdb (pid=0) Backup started
10/26/2012 04:06:05 - Info bpbrm (pid=30544) bptm pid: 30808
10/26/2012 04:06:05 - connected; connect time: 0:00:00
10/26/2012 04:06:06 - Info bptm (pid=30808) start
10/26/2012 04:06:06 - Info bptm (pid=30808) using 262144 data buffer size
10/26/2012 04:06:06 - Info bptm (pid=30808) setting receive network buffer to 1048576 bytes
10/26/2012 04:06:06 - Info bptm (pid=30808) using 16 data buffers
10/26/2012 04:06:25 - Info bptm (pid=30808) start backup
10/26/2012 04:06:25 - Info bptm (pid=30808) backup child process is pid 32016
10/26/2012 04:06:25 - begin writing
10/26/2012 04:06:28 - Error bptm (pid=30808) media manager terminated by parent process
10/26/2012 04:07:03 - Info bphdb (pid=0) done. status: 150: termination requested by administrator
10/26/2012 04:07:03 - end writing; write time: 0:00:38
client process aborted  (50)
 

Any additional insight would be most appreaciated! I'm happy to provide additional data as required.