Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Backup of OS files works, Oracle database fails on Sun Solaris 10 server

Created: 17 Mar 2011 • Updated: 18 Mar 2011 | 15 comments
This issue has been solved. See solution.

Hello everyone,

I have a problem I hope someone can help me with.  I have a new database server and am trying to get Oracle backups to work.  The server is a Sun SPARC server running Solaris 10 and my plan is to eventually migrate off my old server which is also a Sun SPARC server running Solaris 10.  My NetBackup Administration Console is version 6.5 and runs in a Windows server.  My system administrator has installed the NetBackup 6.5 client and the Oracle 6.5 database agent (from the options tar file) and applied the 6.5.1 patch.  On the Administration console I can navigate to NetBackup Management > Host Properties > Client, successfully connect to my new server and have the Admin Console report that the client is running 6.5.1.  I've compelted the tests in the troubleshooting guide (ping, telnet to a port) and all have worked perfectly.  I've even created a test policy on the administration console and successfully did an OS backup of the >3400 files under /etc.  This tells me that there are no networking or firewall issues between my NetBackup console and my new database server.

Next, I copied the Oracle database backup script from my old server to the new one but when I try to run an Oracle database backup, it immediately fails with "the backup failed to back up the requested files(6)".  I've confirmed that the permissions on the backup script on the new server match the permissions on the server where everything works.  After attempting a database abckup on the new server I find that the backup script did NOT create any log file so it appears it was never started.  In addition, I've checked the following directories and they are all empty:

  /usr/openv/netbackup/logs

  /usr/openv/netbackup/logs/user_ops

  /usr/openv/netbackup/logs/user_ops/dbext

  /usr/openv/netbackup/logs/user_ops/dbext/oracle

  /usr/openv/netbackup/logs/user_ops/nbjlogs

For what it's worth I've confirmed that all the above directories are set to 777 permission so that my oracle account has write access.  This is an extremely frustrating problem as 1) OS backups work, 2) DB backups fail and 3) there are absolutely zero log files showing what the source of the problem may be.

Any suggestion on how to investigate this further would be greatly appreciated.

Comments 15 CommentsJump to latest comment

khemmerl's picture

I tried adding VERBOSE to /usr/openv/netbackup/bp.conf.  Still no log files generated during a database backup.

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

khemmerl's picture

I created the bpcd and bprd directories under /usr/openv/netbackup/logs and managed to get my first log file.  Unfortunately, all the exit codes are 0 indicating a normal successful completion.

The weirdness continues - I'll append a txt extension to the log file and attach it to this post.

AttachmentSize
log.031711.txt 5.05 KB

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

Marianne's picture

Have you executed oracle_link on the new client?

Please see Oracle agent Guide for more info: http://www.symantec.com/docs/TECH127053

Logs that you need on the new client: dbclient, bphdb, bpcd.

Ensure 777 permissions on dbclient and bphdb folders.

Also - please double-check NBU versions - your NBU master server should be the same or higher version as the client.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Will Restore's picture

... is generic message for failed database abckup.  Check the RMAN output file on the client.

Or check dbclient log directory on the client; create it if it does not exist and rety the job.

 

Will Restore -- where there is a Will there is a way

khemmerl's picture

Thanks for the recommendation.  I shut down all the Oracle processes and ran oracle_link.  Unfortunately it didn't make any difference.

I created dbclient, bphdb and set their permissions to 777 and tried a backup again.  It still failed but I finally have a log file that indicates a problem.  "/usr/openv/netbackup/logs/bphdb/obk_stdout.031711" contains the following:

> ERROR: Cannot recognize the host name s6320bl02

Stange.  When I run the unix 'hostname' command I get:

> s6320bl02

Any ideas?

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

khemmerl's picture

I'm not sure where that error comes from.  On the windows-based Administration console I can ping the server without any problem:

> C:\>ping s6320bl02
>
> Pinging s6320bl02.apacorp.net [192.168.84.207] with 32 bytes of data:
>
> Reply from 192.168.84.207: bytes=32 time<1ms TTL=255
> Reply from 192.168.84.207: bytes=32 time<1ms TTL=255
> Reply from 192.168.84.207: bytes=32 time<1ms TTL=255
> Reply from 192.168.84.207: bytes=32 time<1ms TTL=255
>
> Ping statistics for 192.168.84.207:
>     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
> Approximate round trip times in milli-seconds:
>     Minimum = 0ms, Maximum = 0ms, Average = 0ms

Not surprisingly, the new database server can ping itself:

> bash-3.00# ping s6320bl02
> s6320bl02 is alive

Any other ideas?

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

khemmerl's picture

Apologies, I forgot to answer your final question regarding NetBackup versions:  I have confirmed that the Administration Console is 6.5.1 and the client is 6.5.1 so there should not be a problem there.  As mentioned in my original post, the same version of this software has been backing up the databases on my old server for about 3 years.

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

Will Restore's picture

Ensure that the client and NetBackup servers have consistent and correct hostname resolution.  Use the NetBackup command “bpclntcmd”, as documented in the NetBackup Troubleshooting Guide, for help in diagnosing hostname resolution problems.

Ensure that the master and all media servers are listed in the client’s server list.  Also, ensure that the client name used in the policy on the master server is the same name used to configure the client and is the name resolved by the NetBackup servers when translating the client’s IP address to a hostname.

Will Restore -- where there is a Will there is a way

khemmerl's picture

As recommended in the NetBackup Troubleshooting guide, I have confirmed that the appropriate entries are in /etc/inedt.conf.

> # grep bp /etc/inetd.conf
> bpcd    stream  tcp     nowait  root    /usr/openv/netbackup/bin/bpcd bpcd
> bpjava-msvc     stream  tcp     nowait  root    /usr/openv/netbackup/bin/bpjava-msvc bpjava-msvc -transient

I ran bpclntcmd -hn on the Netbackup admin console and it completes normally:

> D:\Program Files\Veritas\NetBackup\bin>bpclntcmd -hn s6320bl02
> host s6320bl02: s6320bl02.apacorp.net at 192.168.84.207 (0xcf54a8c0)
> aliases:

I ran bpclntcmd -hn on the client machine (my database server) and it completes normally:

> bash-3.00# ./bpclntcmd -hn s6320bl02
> host s6320bl02: s6320bl02 at 192.168.84.207 (0xc0a854cf)
> aliases:     s6320bl02.apacorp.net     loghost

I ran bpclntcmd -pn on the client machine (my database server) and it completes normally:

> # ./bpclntcmd -pn
> expecting response from server inf-srv17.apacorp.net
> s6320bl02.apacorp.net s6320bl02 192.168.84.207 59706

All this is not unexpected as file backups through the OS work yet the Oracle backups continue to fail.  logs/bphdb/obk_stdout.031711 continues to report:

> ERROR: Cannot recognize the host name s6320bl02

 

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

khemmerl's picture

This seems to be really fubar.  Other than the fact I can now see the obk_stdout error message, there's been no real progress.  I've worked through the first 63 pages of the Troubleshooting guide, I've confirmed the entries in /etc/inetd.conf and /etc/services, I've run bpclntcmd on the client and the Netbackup server running the admin console.  By all accounts everything checks out yet I still can't backup my databases.  I think I'll have to get my sysadmin to uninstall the NetBackup client and start all over again.  I'm not sure what will happen if THAT doesn't fix the problem.

As always, any recommendations are appreciated. 

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

Marianne's picture

Please post the following:

Rman output (check the script for location of OUTFILE)

dbclient log

Please double-check the script as well - the client name is sometimes hard-coded (NB_ORA_CLIENT). If the script was copied from another client, it needs to be changed.Also double-check policy name and other parameters in the script.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

khemmerl's picture

There is no RMAN output and my dbclient log is empty.  I don't have a hard coded value for NB_ORA_CLIENT in my script but I do have a call to hostname which I use to set a parameter indicating development or production.  Since this is a new server, the script needs to be updated to recognize the name of the new host.  I'll make the change and update this post with the results.

Ken

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

Yogesh9881's picture

As you say u will install NBU client againt

hence i requesting you to dont copy script again from other server, you can edit sample script.

or u can create custome script at client by jbpSA.

If this post has helped you, please vote or mark as solution.

Before break-up, make sure you have a good backup.....  ;-)

khemmerl's picture

My backups are running as I write this.  It turns out there were three problems:

1) My system administrator (who has only been with the company for two months) originally only installed the NetBackup Client and not the Oracle Client.

2) My backup script had a bug related to a call to the hostname command.  When an unknown hostname was encountered (as was the case with this new server), the script failed without generating an informative log file.

3) Since it's been over 3 years since I created a new database, I had forgotten that my backups reply on configuration stored within RMAN to set the device type, redundancy and parallelism.  I needed to run the following commands in the new database:

 rman catalog=rman10@rmandb target=backup_dba

configure retention policy to redundancy=2;
configure controlfile autobackup on;
configure controlfile autobackup format for device type sbt_tape to '%d_%F';
configure default device type to SBT_TAPE;

configure DEVICE TYPE sbt_tape PARALLELISM 2;
configure channel 1 device type SBT_TAPE maxopenfiles=1 send='NB_ORA_POLICY=ORA_<SID>, NB_ORA_SCHED=Oracle_Online_Backup';
configure channel 2 device type SBT_TAPE maxopenfiles=1 send='NB_ORA_POLICY=ORA_<SID>, NB_ORA_SCHED=Oracle_Online_Backup';

The benefits of putting this configuration in RMAN is:

1) My large (500GB) databases backup to 4 tapes, my medium (150GB) databases backup to 2 tapes and my small (<50GB) database back up to a single tape automatically without a lot of extra logic within the backup script.

2) The commands in my backup script that actually do the backup are much simpler and can be executed without the need to specify a "run" block.  They are now:

   backup tag=${DB_TAG} filesperset=1 format='%d_df%f_%T_%s' database;
sql 'alter system archive log current';
backup tag=${DB_TAG} filesperset=1 format='%d_al%e_%T_%s' archivelog all not backed up 3 times;
delete noprompt copy of archivelog all completed before 'SYSDATE-${ARCH_RETAIN}'

Instead of:

 run {
  allocate channel t1 type 'SBT_TAPE' maxopenfiles=1;
  allocate channel t2 type 'SBT_TAPE' maxopenfiles=1;
  send 'NB_ORA_POLICY=${NB_ORA_CLASS}, NB_ORA_SCHED=${NB_ORA_SCHED}';
  backup
    full
    tag ${DB_TAG}
    skip inaccessible
    filesperset=1
    format='%d_df%f_%T_%s'
    database;
  sql 'alter system archive log current';
  backup
    tag ${DB_TAG}
    filesperset=1
    format '%d_al%e_%T_%s'
    archivelog all not backed up 3 times;
  delete noprompt copy of archivelog all completed before 'SYSDATE-${ARCH_RETAIN}';
  release channel t1;
  release channel t2;
}

 

 

Ken Hemmerling
Alberta Pensions Services Corporation
Database Administrator
5103 Windermere Blvd. SW
Edmonton, AB T6W 0S9

SOLUTION
Marianne's picture

Glad to see you followed advice to "double-check the script as well".......

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links