Video Screencast Help

Netbackup 7.5 Suse linux Startup hangs after Agent Request Server

Created: 10 Feb 2014 • Updated: 05 Jun 2014 | 17 comments
This issue has been solved. See solution.

Netbackup Startup hangs after Agent Request Server,,, what could cause this?

Any recommendations on where to go from here?

Operating Systems:
Discussion Filed Under:

Comments 17 CommentsJump to latest comment

Marianne's picture

More info please... There is no known issue that would cause this.

Is this a new installation that has never worked?

If it has worked before - what happened inbetween working and now hanging?

Which 7.5 patch level?

Please show us output of NBU startup output, plus, from another window, run 'bpps -x' and copy output.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Robert_GI's picture

I installed a media server, using a reffernece to a master server... the customer came back with a different master server.  Using the Symantec NetBackup™ Installation Guide for UNIX and Linux Release 7.5, I removed the installation and reinstalled with the new master server name.

The install and patch application looked okay, but when I do a "netbackup start", the proceess to "NetBackup Agent Request Server started" where it hangs and eventually comes back with a "^C".

Something is not right with my re-install of the media server... 

The removal process went like:

  546  2014-02-11 19:09:30 echo '==== Begin NBU remove ====='
  547  2014-02-11 19:09:50 /usr/openv/netbackup/bin/bp.kill_all
  548  2014-02-11 19:10:35 /usr/openv/netbackup/bin/vxlogcfg -r -p 51216
  549  2014-02-11 19:11:03 /usr/openv/netbackup/bin/nblu_registration -r
  550  2014-02-11 19:21:11 find / -name vxdbms.conf
  551  2014-02-11 19:21:54 /usr/openv/netbackup/bin/admincmd/nbftsrv_config -d
  552  2014-02-11 19:24:35 rpm -e SYMCnetbp
  553  2014-02-11 19:25:25 rpm -e SYMCnbjava
  554  2014-02-11 19:25:50 rpm -e SYMCnbjre
  555  2014-02-11 19:26:33 rpm -e SYMCnbclt
  556  2014-02-11 19:26:46 rpm -e VRTSpbx
  557  2014-02-11 19:28:00 cd /usr/openv
  558  2014-02-11 19:28:12 pwd
  559  2014-02-11 19:28:16 ls
  560  2014-02-11 19:28:50 rm -rf *
  561  2014-02-11 19:29:03 cd /
  562  2014-02-11 19:29:18 rm -f /usr/openv
  563  2014-02-11 19:31:29 rm -f /etc/init.d/netbackup*
  564  2014-02-11 19:31:29 rm -f /etc/init.d/rc0.d/K01netbackup
  565  2014-02-11 19:31:29 rm -f /etc/init.d/rc2.d/S77netbackup
  566  2014-02-11 19:31:30 rm -f /etc/init.d/rc3.d/S77netbackup
  567  2014-02-11 19:31:30 rm -f /etc/init.d/rc5.d/S77netbackup
  568  2014-02-11 19:31:30 rm -f /etc/init.d/rc6.d/K01netbackup
  569  2014-02-11 19:31:30 rm -f /etc/init.d/nbclient*
  570  2014-02-11 19:31:30 rm -f /etc/init.d/rc0.d/K01nbclient
  571  2014-02-11 19:31:30 rm -f /etc/init.d/rc2.d/S95nbclient
  572  2014-02-11 19:31:30 rm -f /etc/init.d/rc3.d/S95nbclient
  573  2014-02-11 19:31:30 rm -f /etc/init.d/rc5.d/S95nbclient
  574  2014-02-11 19:31:30 rm -f /etc/init.d/rc6.d/K01nbclient
  587  2014-02-11 19:33:05 /opt/Symantec/LiveUpdate/uninstall.sh -a
  588  2014-02-11 19:33:33 rm -f /etc/Symantec.conf
  589  2014-02-11 19:33:54 rm -f /etc/Product.Catalog.JavaLiveUpdate
  590  2014-02-11 19:34:43 /bin/rm -rf /.nbjava
  591  2014-02-11 19:34:43 /bin/rm -rf /.java/.userPrefs/vrts
  592  2014-02-11 19:34:51 rm -rf etc/vx
  593  2014-02-11 19:34:51 rm -rf /opt/VRTS*
  594  2014-02-11 19:34:58 rm -rf /root/.java/.userPrefs/vrts
  595  2014-02-11 19:34:58 rm -rf /root/.veritas
  596  2014-02-11 19:35:27 echo '==== End NBU remove ====='
 
Robert_GI's picture

I have attached the section from the "netbackup" script where I am getting hung up...

NBU.JPG
Robert_GI's picture

Hmmm...  Next thing in the startup of netbackup was BMR, and I did not do step 9 in the removal process:

9 If BMR is supported and enabled on the server, remove the associated files

with the following command:
/usr/openv/netbackup/bin/bmrsetupmaster -undo -f
 
Could that be my problem? 
Robert_GI's picture

First install was fine, the removal process may not have been what was needed... close, not quite right.

7.5.0.5

arcphlbar2-4:/ # /var/opt/teradata/openv/netbackup/bin/bpps -x
NB Processes
------------
root      9436     1  0 11:21 ?        00:00:00 /usr/openv/netbackup/bin/vnetd -standalone
root      9439     1  0 11:21 ?        00:00:00 /usr/openv/netbackup/bin/bpcd -standalone
root      9692     1  0 11:22 ?        00:00:00 /usr/openv/netbackup/bin/bpcompatd
root      9737     1  0 11:22 ?        00:00:05 /usr/openv/netbackup/bin/nbrmms
root      9825     1  0 11:22 ?        00:00:08 /usr/openv/netbackup/bin/nbsl
root      9935     1  0 11:22 ?        00:00:00 /usr/openv/netbackup/bin/nbcssc -a NetBackup
root      9999     1  0 11:22 ?        00:00:01 /usr/openv/netbackup/bin/nbsvcmon
 
 
MM Processes
------------
root      9685     1  0 11:22 ?        00:00:00 vmd
 
 
Shared Symantec Processes
-------------------------
root      9171     1  0 11:21 ?        00:00:00 /opt/VRTSpbx/bin/pbx_exchange
 
Robert_GI's picture
arctdatbar2_95-2:~ # /var/opt/teradata/openv/netbackup/bin/bpps -x
NB Processes
------------
root     19927     1  0 21:07 ?        00:00:00 /usr/openv/netbackup/bin/vnetd -standalone
root     19930     1  0 21:07 ?        00:00:00 /usr/openv/netbackup/bin/bpcd -standalone
root     20100     1  0 21:07 ?        00:00:00 /usr/openv/netbackup/bin/bpcompatd
root     20115     1  0 21:07 ?        00:00:00 /usr/openv/netbackup/bin/nbrmms
root     20167     1  0 21:07 ?        00:00:00 /usr/openv/netbackup/bin/nbsl
root     20202 20194  0 21:07 pts/1    00:00:00 /usr/openv/netbackup/bin/bmrd
 
 
MM Processes
------------
root     20090     1  0 21:07 pts/1    00:00:00 /usr/openv/volmgr/bin/ltid
root     20205     1  0 21:07 pts/1    00:00:00 vmd
 
 
Shared Symantec Processes
-------------------------
root     13847     1  0 20:19 ?        00:00:00 /opt/VRTSpbx/bin/pbx_exchange
 
Marianne's picture

"/usr/openv/netbackup/bin/bmrsetupmaster -undo -f" is for a master server only. So, it depends on whether the media server was installed as a master instead or a media server.

Please help us to understand the difference between the output from arcphlbar2-4 and arctdatbar2_95-2?

The output from the 1st one looks fine - a newly installed media server with no devices configured yet.
ltid and vmd running on the 2nd one with no other Media Manager daemons/processes looks like comms with EMM server has not been established.

Please show us bp.conf on master and both media servers.

We have not yet seen output from 'netbackup start' on the problematic media server?
The screenshot shows the script itself, not the output.

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
Robert_GI's picture

"arctdatbar2_95-2" is a newly installed media server with no devices yet, has not been added to the master yet. "arcphlbar2-4" is an existing media server that the a master knows about it, has devices.

arctdatbar2_95-2:/ #  cat /var/opt/teradata/openv/netbackup/bp.conf

SERVER = phlbserver01
SERVER = arctdatbar2_95-2
CLIENT_NAME = arctdatbar2_95-2
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = phlbserver01
HOST_CACHE_TTL = 3600
TELEMETRY_UPLOAD = NO
 
arcphlbar2-4:/ # cat /var/oparctdatbar2_95-2:t/teradata/openv/netbackup/bp.conf
SERVER = bserver01
SERVER = arcphlbar2-4
CLIENT_NAME = arcphlbar2-4
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = bserver01
HOST_CACHE_TTL = 3600
ALLOW_MEDIA_OVERWRITE = DBR
ALLOW_MEDIA_OVERWRITE = TAR
ALLOW_MEDIA_OVERWRITE = CPIO
ALLOW_MEDIA_OVERWRITE = ANSI
VERBOSE = 5
CLIENT_READ_TIMEOUT = 18000
TELEMETRY_UPLOAD = NO
 
You see two different master servers in the bp.confs,,, the customer in using a new master server in this "phl" location which is the media server I am having problem with.   "arctdatbar2_95-2" is the problem media server.
 
On the output of "arctdatbar2_95-2" of a "netbackup start"... it just stops and hangs.

The "netbackup start" for "arctdatbar2_95-2" worked after the initial install, but has failed since.

Marianne's picture

The ltid and vmd processes on the media server says to me that there is a comms problems between the media server and the master.

Was the media server added to master's SERVER entries followed by NBU restart?

Have you verified forward and reverse name lookup between master and media server?

Have you verified that port 1556 is open in both directions between master and media?
Is iptables stopped and disabled on media server?

Can master communicate with the media server when you issue these commands on the master:

nbemmcmd -listhosts -verbose
nbemmcmd -getemmserver

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Robert_GI's picture
arctdatbar2_95-2:/var/opt/teradata/BAR/SWu_Q12014/NBU/Patches # bpclntcmd -hn phlbserver01 -self
host phlbserver01: phlbserver01 at 10.130.1.33
aliases:     phlbserver01     10.130.1.33
current domain =
NIS does not seem to be running: (1) Request arguments bad
gethostname() returned: arctdatbar2_95-2
host arctdatbar2_95-2: arctdatbar2_95-2 at 10.130.108.139
aliases:     arctdatbar2_95-2     10.130.108.139
 getfqdn: Success
 
arctdatbar2_95-2:/var/opt/teradata/BAR/SWu_Q12014/NBU/Patches # bpclntcmd  -pn
expecting response from server phlbserver01
arctdatbar2_95-2 *NULL* 10.130.108.139 33295
 
 
arctdatbar2_95-2:/var/opt/teradata/BAR/SWu_Q12014/NBU/Patches #  bptestbpcd -client phlbserver01
0 1 2
10.130.108.139:698 -> 10.130.1.33:13782
10.130.108.139:1556 <- 10.130.1.33:55633
<16>bptestbpcd main: Function bpcr_new_standard_socket_rqst(phlbserver01) failed: 41
<16>bptestbpcd main: Function bpcr_new_standard_socket_rqst(phlbserver01) failed: 41
socket close failed
 
 
 
Robert_GI's picture

The first install for the failing media server was to master "bserver01"... that was identified by the customer as needing to be changed after the initial install, the media server software was removed per the netbackup manual, reinstalled with the new master server name of "phlbserver01".  Since that was done, "netbackup start" hangs.  

Adding to the "phlbserver01" master has not happened successfully yet either. 

I dont have direct access to the Netbackup master, will enguage the admin.

Robert_GI's picture

From the master server:

We restarted the master at 14:07 yesterday, with the following bp.conf in place.
 
# pwd
/usr/openv/netbackup
# cat bp.conf
SERVER = phlbserver01
SERVER = arctdatbar2_95-2
SERVER = arctdatbar2_95-2.arc.com
MEDIA_SERVER = arctdatbar2_95-2
MEDIA_SERVER = arctdatbar2_95-2.arc.com
CLIENT_NAME = phlbserver01
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = phlbserver01
HOST_CACHE_TTL = 3600
VXDBMS_NB_DATA = /usr/openv/db/data
OPS_CENTER_SERVER_NAME = gpadm02
TELEMETRY_UPLOAD = NO
#
 
I have the BAR server in the master’s host file; but forward and reverse DNS are working fine.
 
# grep arctdat /etc/hosts
10.130.108.139 arctdatbar2_95-2 arctdatbar2_95-2.arc.com
#
 
# nslookup arctdatbar2_95-2
Server:         192.168.130.30
Address:        192.168.130.30#53
 
Name:   arctdatbar2_95-2.arc.com
Address: 10.130.108.139
 
# nslookup 10.130.108.139
Server:         192.168.130.30
Address:        192.168.130.30#53
 
139.108.130.10.in-addr.arpa     name = arctdatbar2_95-2.arc.com.
 
#
# grep hosts /etc/nsswitch.conf
# DNS for hosts lookups, otherwise it does not use any other naming service.
# "hosts:" and "services:" in this file are used only if the
hosts:      files dns
# before searching the hosts databases.
#
 
I can communicate from the master to the BAR on 1556:
 
# telnet arctdatbar2_95-2 1556
Trying 10.130.108.139...
Connected to arctdatbar2_95-2.
Escape character is '^]'.
^]
 
telnet> quit
Connection to arctdatbar2_95-2 closed.
#
 
Here is the output of the nbemmcmd commands.
 
# nbemmcmd -listhosts -verbose
NBEMMCMD, Version: 7.5.0.4
The following hosts were found:
phlbserver01
        MachineName = "phlbserver01"
        FQName = "phlbserver01.drsite.net"
        MachineDescription = ""
        MachineNbuType = server (6)
phlbserver01
        ClusterName = ""
        MachineName = "phlbserver01"
        FQName = "phlbserver01.drsite.net"
        GlobalDriveSeed = "VEND:#.:PROD:#.:IDX"
        LocalDriveSeed = ""
        MachineDescription = ""
        MachineFlags = 0x37
        MachineNbuType = master (3)
        MachineState = active for disk jobs (12)
        NetBackupVersion = 7.5.0.4 (750400)
        OperatingSystem = solaris (2)
        ScanAbility = 5
dd670-2.backupnet.arc.com
        MachineName = "dd670-2.backupnet.arc.com"
        FQName = "dd670-2.backupnet.arc.com"
        MachineDescription = "DataDomain"
        MachineFlags = 0x2
        MachineNbuType = ndmp (2) (storage_server)
Command completed successfully.
#
 
# nbemmcmd -getemmserver
NBEMMCMD, Version: 7.5.0.4
These hosts were found in this domain: phlbserver01
 
Checking with the host "phlbserver01"... 
 
Server Type    Host Version        Host Name                     EMM Server          
MASTER         7.5                 phlbserver01                  phlbserver01        
 
Command completed successfully.
#
 
Marianne's picture

My gut-feel is that the media server startup is hanging because comms cannot be established with the master server.

Please add SERVER entry on master for the media server, ensure forward and reverse lookup as well as port connectivity in both directions, then try again.

If media server startup is still hanging after this, I suggest you go through removal of software once more.
Ensure /usr/openv is totally wiped.
Install again. Check startup after base 7.5 installation. Only when startup is good, stop NBU and patch.
Take note of location of installation logs for troubleshooting purpose.

PS:

Unless there is a very good reason for CONNECT_OPTIONS in bp.conf, please get rid of it... the NBU defaults work well.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Robert_GI's picture

In the end... it was that the customer firewalls that were in the way of the new master server.  frownblush

Once all denials were taken care of... everything worked, including netbackup start / stop.

All commands began to work after the firewall was fixed:

bpclntcmd -pn
bpclntcmd -hn <target server name -- master or media>
bpclntcmd -ip <IP_From_Command_Above>
 
grep hosts /etc/nsswitch.conf
 
nbemmcmd -listhosts
nbemmcmd -getemmserver
 
Enabling logging (mklogdir or specific components) with increased verbocity help quite a bit.
 
All ports initially worked from the media server to the master, but apparently not the other way.
 
As Marianne pointed out, when things dont make sense... Go back to the basics.  
 
Revisiting what should have already been done.
Marianne's picture

So, my gut-feel was right?? wink

My gut-feel is that the media server startup is hanging because comms cannot be established with the master server.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Robert_GI's picture

Yes.  Your gut-feel was right on.

Did all I could from the media server to the master as it was recommended...

Did not have direct or consistent indirect access to the Master Server, in hind-sight (20-20), I should have stopped my activities until I got results from the NBU Admin for the customer.  When the customer did engage in running communications diagnostics,  firewall denials we seen... once taken care of, all worked.

Thanks again for your council.  Much appreciated smiley