Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Netbackup 7 hangs on storage device mgmnt

Created: 11 Jan 2011 • Updated: 02 May 2011 | 7 comments
This issue has been solved. See solution.

Hi,

I have a new install of NB7 on a Solaris 10 x86 global zone with the required patch level for the devices involved.  Using either the cmd line nbadm menu or the java gui when selecting storage device managment (to add new devices in my case) NB will just hang.  I've read the device configuration guide and am using the Veritas/Symantec 'sg' driver along with the related sg/st config files.  During the initial install only the drives showed up, not the library (SL48).  Following one of the threads here I disabled multipathing (I'm not using it anyway) and added the new external devices/robots text files. 

./sgscan all

...

/dev/sg/c0tw5001438003300582l0: Tape (/dev/rmt/0): "HP      Ultrium 5-SCSI"
/dev/sg/c0tw5001438003300582l1: Changer: "HP      MSL G3 Series"
/dev/sg/c1tw5001438003300588l0: Tape (/dev/rmt/1): "HP      Ultrium 5-SCSI"

The library and drives show up in scsgan but dont seem to be properly identified in volmgr/bin/scan, possibly because they are not configured in NB yet:

./scan
************************************************************
*********************** SDT_TAPE    ************************
*********************** SDT_CHANGER ************************
************************************************************
Unable to intialize the device mappings table, status = 1
------------------------------------------------------------
Device Name  : "/dev/sg/c0tw5001438003300582l1"
Passthru Name: "/dev/sg/c0tw5001438003300582l1"
Volume Header: ""
Port: -1; Bus: -1; Target: -1; LUN: -1
Inquiry    : "HP      MSL G3 Series   G.20"
Vendor ID  : "HP      "
Product ID : "MSL G3 Series   "
Product Rev: "G.20"
Serial Number: "1036BRZ00R"
WWN          : ""
WWN Id Type  : 0
Device Identifier: "HP      MSL G3 Series   1036BRZ00R"
Device Type    : SDT_CHANGER
NetBackup Robot Type: Not Found(6)
Removable      : Yes
Device Supports: SCSI-5
Number of Drives : 2
Number of Slots  : 48
Number of Media Access Ports: 0
Flags : 0x0
Reason: 0x0

...

There are a couple of errors that lead me to think I'll need to re-install NB now that the devices are seen properly by the OS.  Before I take that course I thought I'd ask for pointers here.

./tpconfig -d
EMM interface initialization failed, status = 77

dmesg: Vault License validation FAILED. Error=159

./nbdb_ping
Database [NBDB] is not available.
 

./netbackup/bin/bpps -x
NB Processes
------------
    root 15727     1   0 06:41:38 ?           0:00 /usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE
    root  1753     1   0 16:36:27 ?           0:16 /usr/openv/db//bin/NB_dbsrv @/usr/openv/var/global/server.conf @/usr/openv/var/
    root  1756     1   0 16:36:28 ?           0:02 /usr/openv/netbackup/bin/nbevtmgr
    root  1707     1   0 16:36:27 ?           0:00 /usr/openv/netbackup/bin/bpcd -standalone
    root  1789     1   0 16:36:29 ?           0:01 /usr/openv/netbackup/bin/nbpem
    root  1775     1   0 16:36:28 ?           0:04 /usr/openv/netbackup/bin/bprd
    root  1798     1   0 16:36:29 ?           0:05 /usr/openv/netbackup/bin/nbrmms
    root  2247  1789   0 16:52:29 ?           0:13 /usr/openv/netbackup/bin/nbproxy dblib nbpem
    root 11776  1789   0 02:40:07 ?           0:00 /usr/openv/netbackup/bin/nbproxy dblib nbpem_cleanup
    root  1817     1   0 16:36:30 ?           0:00 /usr/openv/netbackup/bin/nbsl
    root  1859  1784   0 16:36:32 ?           0:00 /usr/openv/netbackup/bin/bpjobd
    root  1781     1   0 16:36:29 ?           0:00 /usr/openv/netbackup/bin/bpcompatd
    root  1786     1   0 16:36:29 ?           0:22 /usr/openv/netbackup/bin/nbjm
    root  1784     1   0 16:36:29 ?           0:00 /usr/openv/netbackup/bin/bpdbm
    root 15807     1   0 06:46:38 ?           0:00 /usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE
    root  1704     1   0 16:36:27 ?           0:00 /usr/openv/netbackup/bin/vnetd -standalone
    root  1838     1   0 16:36:31 ?           0:02 /usr/openv/netbackup/bin/nbsvcmon
    root  1794     1   0 16:36:29 ?           0:20 /usr/openv/netbackup/bin/nbstserv

MM Processes
------------
    root  1768     1   0 16:36:28 ?           0:15 /usr/openv/volmgr/bin/ltid
    root  1856     1   0 16:36:31 ?           0:01 vmd

Shared Symantec Processes
-------------------------
    root  1695     1   0 16:36:02 ?           1:21 /opt/VRTSpbx/bin/pbx_exchange

(/usr/openv is a symlink to /opt/openv)

There are three licenses that are reported from the admincmd/get_license_key command.  All are current, it's a brand new install. 

Any thoughts?

Regards,

George Gist

Comments 7 CommentsJump to latest comment

Nicolai's picture

I think you have a issue with the EMM database within Netbackup.

These two command usually go away after start of Netbackup, if they persist, you may have EMM issue.

/usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE

/usr/openv/netbackup/bin/nbproxy dblib nbpem_cleanup

What does "nbemmcmd -listhosts"  return ?

update:

This really say it all

./tpconfig -d
EMM interface initialization failed, status = 77

./nbdb_ping
Database [NBDB] is not available.

It is a EMM issue. EMM play a central part of Netbackup and will not function at all without it. What does  [install_path]/db/bin/nbdbms_start_server return, I expect some sort of error.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

GeorgeGist's picture

Thanks for the quick reply.  There are two networks on the Solaris box, I'm going to run some quick packet captures to see if NetBackup is trying to communicate over the secondary (non netbackup) interface for any odd reason...  Kind of doubt it but need to be sure...

 ./db/bin/nbdbms_start_server

NB_dbsrv is already running.

Following just hangs, I did not let it time out.

 /usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE
^C

Following seems to hang too.  Probably trying to resolve something..

./nbemmcmd -listhosts
NBEMMCMD, Version:7.0

.... still running

Thanks again,

Regards,

George

GeorgeGist's picture

Sorry -- misread.  Yeah, following two hang.

/usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE

/usr/openv/netbackup/bin/nbproxy dblib nbpem_cleanup

Both FQDN and shortnames are in bp.conf and /etc/hosts. 

-g

Nicolai's picture

Two active network can be the cause. Netbackup can work with many network card active IF you did the homework first. I not Netbackup get totally confused :-D

All IP's must resolve forward and reverse to unique name. No alias or DNS pointers are allowed.

Make sure the name of the master server is the server name (you can use different names but then you really need to be in control)

You may also need to set REQUIRED_INTERFACE = HOSTNAME_YOU_WANT_TO_USE in bp.conf

I suggest to disable one NIC and restart Netbackup when IP and name resolution is in place.

If you know tcpdump - use it

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Marianne's picture

emm is not starting up - bpps should have displayed this entry:

/usr/openv/netbackup/bin/nbemm

You will probably find that emm starts up and then 'dies'.

I agree with Nicolai - each NIC/IP should be associated with it's own hostname.

EMM  and master server name should ideally correspond with entry in /etc/nodename.

Create admin log dir (/usr/openv/netbackup/logs/admin). Stop NBU. Check that all processes/daemons go down. Restart NBU.

Check admin log - if 2nd interface is used for comms, you will find evidence in this log.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

GeorgeGist's picture

I've removed the second interface completly (rebooted, etc).  Only one NIC is active, one IP only.  Prior to that each NIC had a specific hostname.  (hostname / etc/nodename was always the SERVER/EMMSERVER name).

/etc/{nodename,hostname.interface, hosts} entries match bp.conf's SERVER and EMMSERVER entries.  DNS is not in use, only host file entries.  So, if there is some need to resolve PTR RR's on IP's that could be problematic but I doubt that's the case.

Still have the same issue.  Looking at the log in /usr/openv/logs/nbemm I see some errors and the following complaint.  Running truss on the startup of /usr/openv/netbackup/bin/nbemm does not show that it's looking for vxdbms.conf so this may not be relevant.

Can't open configuration file:  /usr/openv/db/data/vxdbms.conf,21:CS
ettings::Initialize,1

Yet, there is some hint to credentials followed by a connection refused in the log below. 

[root@backup:/opt/openv/logs/nbemm]# /etc/init.d/netbackup start
NetBackup network daemon started.
NetBackup client daemon started.
NetBackup SAN Client Fibre Transport daemon started.
NetBackup Database Server started.
NetBackup Event Manager started.
NetBackup Enterprise Media Manager started.
NetBackup Resource Broker started.
Media Manager daemons started.
NetBackup request daemon started.
NetBackup compatibility daemon started.
NetBackup Job Manager started.
NetBackup Policy Execution Manager started.
NetBackup Storage Lifecycle Manager started.
NetBackup Remote Monitoring Management System started.
NetBackup Key Management daemon started.
NetBackup Service Layer started.
NetBackup Agent Request Server started.
NetBackup Bare Metal Restore daemon not started.
NetBackup Vault daemon started.
NetBackup Service Monitor started.
NetBackup Bare Metal Restore Boot Server daemon started.

[root@backup:/opt/openv/logs/nbemm]# ls -la
total 14
drwxr-xr-x   2 root     root           4 Jan 11 14:29 .
drwxr-xr-x  17 root     root          17 Dec 16 13:44 ..
-rw-rw-r--   1 root     root        5077 Jan 11 14:29 51216-111-737367621-110111-0000000001.log
[root@backup:/opt/openv/logs/nbemm]# more 51216-111-737367621-110111-0000000001.log
$Header 65543,51216,111,1294784956,28800,backup.noc.somedomain.com
1,51216,111,111,1,1294784956569,2143,1,0:,0:,21:CSettings::Initialize,1,(1008|)
0,51216,111,111,2,1294784956570,2143,1,0:,62:Can't open configuration file:  /usr/openv/db/data/vxdbms.conf,21:CS
ettings::Initialize,1
0,51216,111,111,3,1294784956571,2143,1,0:,1: ,20:EMMServer::EMMServer,1
0,51216,111,111,4,1294784956571,2143,1,0:,44:Setting to default DSM ORB thread count <12>,32:DSMFSMORBConfig::DSM
FSMORBConfig,1
0,51216,111,111,5,1294784956571,2143,1,0:,25:Setting ORB count to <10>,26:REMORBConfig::REMORBConfig,1
0,51216,111,111,6,1294784956571,2143,1,0:,25:Setting ORB count to <10>,38:FATClientORBConfig::FATClientORBConfig,
1
0,51216,137,111,1,1294784956577,2143,1,0:,34:switched to a new logging callback,16:log_set_callback,1
2,51216,137,111,2,1294784956582,2143,1,0:,0:,0:,0,(74|)
2,51216,137,111,3,1294784956583,2143,1,0:,0:,0:,0,(13|A3:EMM|)
2,51216,137,111,4,1294784956583,2143,1,0:,0:,0:,0,(74|)
0,51216,111,111,7,1294784956620,2143,1,0:,7:<ENTER>,7:EMMMain,1
0,51216,111,111,8,1294784956623,2143,1,0:,1: ,31:EMMServer::isLocalHostEMMServer,1
0,51216,111,111,9,1294784956623,2143,1,0:,58:LocalHost = <backup.noc.somedomain.com>, EmmServer = <backup>,31:EMMSer
ver::isLocalHostEMMServer,1
0,51216,111,111,10,1294784956623,2143,1,0:,12:Should start,23:NBEmmSvc::doShouldStart,1
0,51216,137,111,5,1294784956623,2143,1,0:,50:Passed service start criteria.(OrbService.cpp:976),25:OrbService::co
ntrolledRun,1
0,51216,137,111,6,1294784956623,2143,1,0:,107:successfully set max data limit: current=18446744073709551613, max=
18446744073709551613(OrbService.cpp:242),27:OrbService::setMaxDataLimit,1
2,51216,137,111,1,1294784956655,2145,1,0:,0:,0:,0,(74|)
2,51216,137,111,2,1294784956655,2145,1,0:,0:,0:,0,(20|A3:EMM|)
2,51216,137,111,3,1294784956655,2145,1,0:,0:,0:,0,(74|)
0,51216,137,111,4,1294784956660,2145,1,0:,49:endpointvalue is : pbxiop://1556:EMM(Orb.cpp:630),9:Orb::init,1
0,51216,137,111,5,1294784956661,2145,1,0:,673:initializing ORB EMM with: EMM -ORBSvcConfDirective "-ORBDottedDeci
malAddresses 0" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static End
pointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORB
SvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBSvcConfDirective "static PBXIOP
_Evaluator_Factory '-orb EMM'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORB
Endpoint pbxiop://1556:EMM -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRec
vGIOPPayloadSize 268435456'"(Orb.cpp:741),9:Orb::init,1
0,51216,137,111,6,1294784961695,2145,1,0:,94:../../libvlibs/vnet_vxss.c.780: could not find any credential for ho
st: backup.noc.somedomain.com,41:vnet_get_machine_credential_path_for_name,1
0,51216,137,111,7,1294784961695,2145,1,0:,62:../../libvlibs/vnet_vxss.c.781: Function failed: 35 0x00000023,41:vn
et_get_machine_credential_path_for_name,1
1,51216,137,111,8,1294784961695,2145,1,0:,0:,37:ClientNbacEvaluator::getOurCredential,2,(76|)
2,51216,137,111,9,1294784961701,2145,1,0:,0:,0:,3,(6|)
1,51216,137,111,10,1294784961701,2145,1,0:,0:,9:Orb::init,6,(103|A170:system exception, ID 'IDL:omg.org/CORBA/BAD
TAO exception, minor code = 5 (endpoint initialization failure in Acceptor Registry; ECONNREFUSED), completed = N
|)
2,51216,111,111,1,1294784961701,2145,1,0:,0:,0:,0,(1046|A7:EMMMain|A9:BAD_PARAM|)
2,51216,111,111,2,1294784961701,2145,1,0:,0:,0:,2,(1049|A53:Exception caught attempting EMM orb startup BAD_PARAM
|)
2,51216,137,111,11,1294784961701,2145,1,0:,0:,0:,0,(74|)
2,51216,137,111,12,1294784961701,2145,1,0:,0:,0:,0,(12|A3:EMM|)
2,51216,137,111,13,1294784961701,2145,1,0:,0:,0:,0,(74|)
0,51216,111,111,3,1294784961702,2145,1,0:,28:Calling fini() on FAT Client,31:FATClientORBConfig::shutServant,1
0,51216,111,111,4,1294784961702,2145,1,0:,25:Done fini() on FAT Client,31:FATClientORBConfig::shutServant,1
0,51216,111,111,5,1294784961702,2145,1,0:,21:Calling fini() on REM,25:REMORBConfig::shutServant,1
0,51216,111,111,6,1294784961702,2145,1,0:,18:Done fini() on REM,25:REMORBConfig::shutServant,1
0,51216,111,111,7,1294784961702,2145,1,0:,21:Calling fini() on FSM,29:DSMFSMORBConfig::shutServants,1
0,51216,111,111,8,1294784961703,2145,1,0:,18:Done fini() on FSM,29:DSMFSMORBConfig::shutServants,1
0,51216,111,111,9,1294784961703,2145,1,0:,21:Calling fini() on DSM,29:DSMFSMORBConfig::shutServants,1
0,51216,111,111,10,1294784961703,2145,1,0:,18:Done fini() on DSM,29:DSMFSMORBConfig::shutServants,1
0,51216,111,111,11,1294784961703,2145,1,0:,24:EMM Server shutting down,15:EMMServer::fini,1
2,51216,111,111,12,1294784961703,2145,1,0:,0:,0:,0,(1003|A15:EMMServer::fini|)
0,51216,111,111,13,1294784961703,2145,1,0:,29:EMM Server shut down complete,15:EMMServer::fini,1
0,51216,111,111,14,1294784961703,2145,1,0:,6:<EXIT>,7:EMMMain,1
0,51216,111,111,15,1294784961703,2145,1,0:,1: ,11:Log::deinit,1
0,51216,111,111,16,1294784961703,2145,1,0:,1: ,21:EMMServer::~EMMServer,1
[root@backup:~]#

Should be mentioned that when stopping from 'goodies/netbackup stop' the following procs don't die and need to be killed manually.

./bpps
    root  1385     1   0 14:19:09 ?           0:00 /usr/openv/netbackup/bin/nbproxy dblib nbpem
    root  1363     1   0 14:18:11 ?           0:00 /usr/openv/netbackup/bin/admincmd/bpstsinfo -UPDATE

I may scrap this and reinstall with only the netbackup network in place but I see nothing that is trying to call other than localhost and the netbackup network. 

Any other thoughts or suggestions, there must be something else that I've overlooked....

Thanks in advance and regards,

George

GeorgeGist's picture

Too much time spent on a seemingly simple issue.  I re-installed and this issue appears to be resolved.  My thanks for the answers above.  All NetBackup internal black magic aside I can think of only two things that I may have done to cause the troubles initially reported this morning.

1. Initial install was done while multiple NICs/NetWorks were enabled.  While packet captures and system internals (truss, etc) did not show proof of this being an issue it's possible that I missed something in those wonderful volumes of data.

2. Updating the openv/var/global/*.txt files (external_robotics, etc) with some random tar file from symantec that came from a forum link might have brought some undesired files into the picture.  The originals were saved but subsequently lost prior to getting a chance to diff them (yeah, should have done that first:).

Anyway thanks for the support above.

Regards,

George

SOLUTION