This issue needs a solution.

NBU7.5.0.6: nbemmcmd hangs

Created: 16 Oct 2013
Login to vote
0 0 Votes

hi,

I have a problem which tortured me several days.  thanks for your patience in advance!

we have a linux host lx0024nbumast(CentOS 5.9).

1)  It was a remote media server and worked well. but then due to some reason, we need to tranfer it to be NBU master server.

2) at first we just re-install NBU server software, but during the installation log, the following error showed up.

Populating the database tables.  This will take some time.
EMM interface initialization failed, status = 77

and after the install, the command "nbemmcmd -listhosts" hanged as below

[root@lx0024nbumast netbackup]# nbemmcmd  -listho
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

3) we opened case with Symantec, and they suggest that  we add “REQUIRED_INTERFACE = 10.124.2.39” to bp.conf and restart NBU services as well as PBX, since we have multiple active NICs on the linux NBU master.  we tried that but didn't work.

4) and then I tried to un-install and re-install NBU7.5 again and again but with NO luck;

5) I had no choice, but decided to re-install OS, and then installed the box to be NBU master in a total refresh OS environment.

I made this yesterday: re-installed OS, installed NBU7.5 with single active NIC, upgrade to NBU7.5.0.6, enable all NICs, restart the OS and verify that everything seemed good!!!

6) but today the issue appears again!!!   I found only1 of 3 policy succeed, the other failed....and I looked into the NBU only to find that "nbemmcmd" hangs again........

anyone who can show me how to fix this annoying issue?

thank you very much !!!

 

Filed Under

Comments

16
Oct
2013

1) and if I stop NBU

1) and if I stop NBU services, "nbemm" can;t be killed by "netbackup stop" or bp.kill_all

 

2) this is my bp.conf

[root@lx0024nbumast netbackup]# cat bp.conf
SERVER = lx0024nbumast.active.local
CLIENT_NAME = lx0024nbumast.active.local
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = lx0024nbumast.active.local
HOST_CACHE_TTL = 3600
VXDBMS_NB_DATA = /usr/openv/db/data
OPS_CENTER_SERVER_NAME = ws0034nbops01.active.tan
LIST_FS_IMAGE_HEADERS = NO
VERBOSE = 0
SERVER_CONNECT_TIMEOUT = 60
CLIENT_CONNECT_TIMEOUT = 500
CLIENT_READ_TIMEOUT = 500
BPSTART_TIMEOUT = 500
BPEND_TIMEOUT = 500
TELEMETRY_UPLOAD = YES

3) this is my /etc/hosts

# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6

10.124.2.39     lx0024nbumast.active.local lx0024nbumast

16
Oct
2013

I also run

I also run these:

[root@lx0024nbumast netbackup]# /usr/openv/db/bin/dbadm
                1)  Select/Restart Database and Change Password
                2)  Database Space and Memory Management
                3)  Transaction Log Management
                4)  Database Validation Check and Rebuild
                5)  Move Database
SQL Anywhere Validation Utility Version 11.0.1.2958
VALIDATE DATABASE
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord"
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord_Details"
VALIDATE TABLE "ADTR_MAIN"."ADTR_AuditRecord_Placeholder"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_ArchiveLog"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Backup"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Configuration"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_ControlFile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Controlfile_Schema"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Database"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Datafile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Datafile_Backups"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Init_Params"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Instance_Params_Map"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_RedoLog"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Spfile"
VALIDATE TABLE "DARS_MAIN"."DARS_Oracle_Tablespace"
....
VALIDATE TABLE "dbo"."sync_passthrough_script" 
VALIDATE TABLE "dbo"."sync_passthrough_status"
VALIDATE TABLE "rs_systabgroup"."rs_lastcommit"           
VALIDATE TABLE "rs_systabgroup"."rs_threads"       
No errors reported
(1) Down 5 (2) Bottom (3) Up 5 (4) Top (q) Quit

[root@lx0024nbumast netbackup]# /usr/openv/db/bin/nbdb_ping
Database [NBDB] is alive and well on server [NB_lx0024nbumast].

[root@lx0024nbumast netbackup]# /usr/openv/netbackup/bin/nbdbms_start_stop stop
[root@lx0024nbumast netbackup]# /usr/openv/netbackup/bin/nbdbms_start_stop start

Marianne
Trusted Advisor
Accredited
Certified
16
Oct
2013

Best to reinstall OS with

Best to reinstall OS with RHEL or some other supported OS.

CentOS is only supported as media server, not Master. I am surprised that Support did not point this out to you...

See NetBackup 7 Operating System (CL) :    http://www.symantec.com/docs/TECH76648

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

16
Oct
2013

thank you Marianne, but we

thank you Marianne, but we have 9 NBU masters currently which are all CentOS .................so I don't think this is reason. but thank you for your reminder we will look into this.

as I mentioned , this box was used to be a NBU media server. I looked into the log and found the error "Machine name lx0024nbumast.active.local is not recognizable as media server, NDMP filer or cluster"

So I jump to another NBU master only to find that this box is still showing up in the "nbemmcmd -listhost" of that NBU master......not sure if this is the issue.....

 

[root@lx0024nbumast home]# vxlogview -o 111 -t 24:00:00 | grep Error
10/16/2013 04:00:45.184 [Error] V-111-1049 EMMServer generic error = machineName can't be null or empty
10/16/2013 04:00:48.644 [Error] V-111-1125 Machine name lx0024nbumast.active.local is not recognizable as media server, NDMP filer or cluster
10/16/2013 04:00:48.651 [Error] V-111-1049 EMMServer generic error = Configuration does not exist, retval = < 2001045 >
10/16/2013 05:58:24.235 [Error] V-111-1089 Time based query failed because some records were delete

 

 

 

Marianne
Trusted Advisor
Accredited
Certified
16
Oct
2013

Please post output

Please post output of:

nbemmcmd -listhosts -verbose

nbemmcmd -getemmserver

And contents of these .conf files:

/usr/openv/db/data/vxdbms.conf 
/usr/openv/var/global/server.conf
/usr/openv/var/global/databases.conf

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

16
Oct
2013

[root@lx0024nbumast /]#

[root@lx0024nbumast /]# nbemmcmd -getemmserver
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

[root@lx0024nbumast /]# nbemmcmd -listhosts -verbose
NBEMMCMD, Version: 7.5.0.6
Failed to initialize EMM connection.  Verify that network access to the EMM server is available and that the services nbemm and pbx_exchange are running on the EMM server. (195)
Command did not complete successfully.

 

[root@lx0024nbumast ~]# cat /usr/openv/db/data/vxdbms.conf
VXDBMS_NB_SERVER = NB_lx0024nbumast
VXDBMS_NB_PORT = 13785
VXDBMS_NB_DATABASE = NBDB
VXDBMS_AZ_DATABASE = NBAZDB
VXDBMS_NB_DATA = /usr/openv/db/data
VXDBMS_NB_INDEX = /usr/openv/db/data
VXDBMS_NB_TLOG = /usr/openv/db/data
VXDBMS_NB_STAGING = /usr/openv/db/staging
VXDBMS_NB_PASSWORD = 4c9896bf030687f895c7f3090d7368bf7c89fc9fbe23411c
AZ_DB_PASSWORD = Jj8mkP3sKTo=

[root@lx0024nbumast ~]# cat /usr/openv/var/global/server.conf
 -n NB_lx0024nbumast
   -x tcpip(LocalOnly=YES;ServerPort=13785)  -gp 4096 -gd DBA -gk DBA -gl DBA -ti 0 -c 100M -ch 1024M -cl 100M -zl -os 1M -m -o /usr/openv/db//log/server.log
 -ud

[root@lx0024nbumast ~]# cat /usr/openv/var/global/databases.conf
"/usr/openv/db/data/NBDB.db" -n NBDB
"/usr/openv/db/data/NBAZDB.db" -n NBAZDB

Marianne
Trusted Advisor
Accredited
Certified
17
Oct
2013

I see inconsistency as far as

I see inconsistency as far as hostnames are concerned.

bp.conf contains FQDN:

SERVER = lx0024nbumast.active.local
EMMSERVER = lx0024nbumast.active.local

EMM .conf files have all shortnames:

VXDBMS_NB_SERVER = NB_lx0024nbumast

 -n NB_lx0024nbumast

How did this happen?

Please check responses in installation log: /usr/openv/tmp/nstall_trace.####

It seems nbemm is not running. Please do the following:

Stop NBU. Check that all processes terminate.

Start NBU. Check processes. Wheen nbemm stops, run 'vxlogview -o 111 -t 00:10:00' and post output.

I suspect the problem is because of inconsistent naming convention.
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

17
Oct
2013

during the last installation

during the last installation I saw "Database server is NB_lx0024nbumast" in the installation log.

but I didn;t realized that it will casue inconsistency issue.

 

let me re-install to use short names in all the prompts....

 

Marianne
Trusted Advisor
Accredited
Certified
17
Oct
2013

So, shortname was specified

So, shortname was specified as database server but FQDN as master server, right?

Names need to be consistent during installation. 

I prefer shortnames during installation. So, when installation promps to use FQDN, I say no, then provide shortname.  Always easy to add FQDN as alias later on.
We see way too many users on this forum needing to change domain name. If master was installed with FQDN, it is seen as hostname change which needs consulting.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

trv
Certified
Certified
17
Oct
2013

I'am not sure you are right.

I'am not sure you are right. We always install NBU with FQDNs and the installer will put shortname in vxdbms.conf nad server.conf files everytime. It may be because of sybase db name requirements or something - no dots etc. allowed for example.

17
Oct
2013

shortname was specified as

shortname was specified as database server but FQDN as master server, right?  ======>right

 

but I am not sure if this is the root cause, since we use FQDN often.

 

but since CentOS is not supported, so I plan to reinstall it to oracle linux and re-install NBU again~~~~

plus change the IP so as to cut off everything with another NBU master(remember I mentioned that it was a media server)..........

 

thanks everybody for your help !!!really appreicated

Marianne
Trusted Advisor
Accredited
Certified
17
Oct
2013

On a side note - was this

On a side note - was this server properly decommissioned from original master server?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

17
Oct
2013

still not. so we decide to

still not.

so we decide to change both the hostname and IP address so that cut off everything with original master server......

then start to install NBU on a supported OS instead of Centos....

thanks everybody

Jaime_Vazquez
Symantec Employee
17
Oct
2013

Having just worked a similar

Having just worked a similar support case situation, try this:

On the hanging client, run "ifconfig -a" to get ip address information of all interfaces.

Do a host name lookup of each IP and ensure they are valid.  Then do a IP lookup of the host names returned from the reverse lookup. This is called "endpoint selection" process.  From what I can understand, nbemm does this at startup time. If any of this is 'bad', then it will fail with rc=77.

nslookup $ip_addess

Using the returned hostname:  nslookup $hostname

They must match up.

What is happening is that nbemm is checking see if something is local to it or not.  If it feels it is not local to it it will try to contact the hostname, assuming it is external to it.  If the value is invalid, the end to end connection is not made and the initialization fails.

In my case I had a customer who configured a VLAN with an IP address that was not set up in their '/etc/hosts' file or in DNS.  But the DNS response did not fail, but rather returned a bogus host name value.  That was the IP address in reverse order plus the domain name.

"nslookup 1.2.3.4"  returned "4.3.2.1.domain_name".  A effort to contact server "4.3.2.1.domain_name"  naturally failed. rc=77.

 

 

 

Marianne
Trusted Advisor
Accredited
Certified
22
Oct
2013

Suspect Jaime's post may be

Suspect Jaime's post may be very relevant - some incorrect DNS entry perhaps?

This server was also never decommissioned from original master server which could add to issues seen, not to mention unsupported OS for master server....

Curious to see if new OS installation with new hostname and IP address fixed the issue.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

KBN
01
Nov
2013

hi marrianne,      Here i am

hi marrianne,

 

   Here i am attaching ipconfig /all screen shots.

   please find the attachments.

   This is my mail id kbnaidu13@gmail.com

   please may i know u r mail id.

 

Thanks

Naidu 

 

hostname :KBN-PC

 

 

ipconfigallscreen1.jpg ipconfigscreen2.jpg ipconfigscr3.jpg
Marianne
Trusted Advisor
Accredited
Certified
01
Nov
2013

Above screenshots belong to

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links