Oracle (RMAN) job fails, but NetBackup job is ok
Created: 22 Nov 2010 | Updated: 08 Mar 2011 | 9 comments
This issue has been solved. See solution.
Hi All
I have a strange situation. The RMAN backup job ends with this error:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup plus archivelog command at 11/21/2010 02:50:37 ORA-00604: error occurred at recursive SQL level 1 ORA-06502: PL/SQL: numeric or value error ORA-19506: failed to create sequential file, name="<name>", parms="" ORA-27028: skgfqcre: sbtbackup returned error ORA-19511: Error received from media manager layer, error text: VxBSACreateObject: Failed with error: Server Status: client process aborted
but the NetBackup job has status of "0". This happens during the archive log backup in most cases.
The master, media and clients are on RHEL 5.2. The NetBackup is at 6.5.6. CLIENT_READ_TIMEOUT on a client is set to 5400. The backup job is lunched from a client via cron. The backup data are written to a BasicDisk.
In a dbclient log I can see this:
02:32:06.773 [29038] <2> int_CloseImage: INF - Backup - closing <name> 02:47:12.014 [29038] <16> readCommFile: ERR - timed out after 900 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/29038.0.1286324889
It is like the image cannot be closed and the timeout appears, but why after 900s whet CLIENT_READ_TIMEOUT is 5400?
Do you have any idea what is wrong?
Discussion Filed Under:
Comments 9 Comments • Jump to latest comment
CLIENT_READ_TIMEOUT on the media server is used to determine the timeout.
The timeout error could be a 'red herring'. You will need to look at the entire dbclient log as well as the corresponding period in bprd log on the master server.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
The backup job was created - I found it. I thought that the backup was not even initiated.
But in nbpem it looks like this
You need bprd on master to troubleshoot this. Look for incoming request from client's IP.
Check if master is able to resolve IP correctly to hostname that corresponds with name in Policy.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links
Hi
Thank you for your sugestions.
I have looked into bprd log on the master and the request was successfully proccessed.
I fond a problem in nbpem, look at this:
Do you have any idea what does it mean? The server is overloaded and cannot handle connections between nbpem and nbjm? The server statistics seems to be fine.
Konrad,
Did you ever this resolved. I have atleast 10 clients which started failing with similar error. My timeout settings on the media server are
I have atleast 10 clients whi --- is this database level backup or file level backup ?
make sure etc-host entry in master / client and database backup scripts.
192.168.100.15 myserver01 myserver01.mydomain.com
If this post has helped you, please vote or mark as solution.
Before break-up, make sure you have a good backup..... ;-)
If server is running Linux - check to see it IPtables is not running.
If running make sure the data owner between source and backup target is set to the owner of the data and rman script (usaully oracle). If its set to root then the backups will fail becasue root does not own the RMAN backup and transmission of backup data.
First 3 things I check when a system level backup is good but the oracle fails:
1. name resolution for master/media/client forward AND reverse between all. I also place entries for each in each of the local hosts file.
2. the very first line of the bp.conf file must contain
SERVER = masterservername
3. timeouts as described above.
I make sure those 3 things are done befor retesting oracle backups.
The ware a NBRB performace problems, the backup jobs initiated from the client ware not proccessed durig CLIENT_READ_TIMEOUT interval.
The solution was:
1. Increase the CLIENT_READ_TIMEOUT to 5400 to give more time
2. Do some TCP/IP tuning on master server (RHEL 5.2)
After that the problems ware gone.
Would you like to reply?
Login or Register to post your comment.