Video Screencast Help

Storage server create fails

Created: 04 Oct 2013 | 12 comments

Netbackup version 7.5.0.4- on master server, all media servers and all clients

Master server Solaris 10x86

Existing media server Solaris 10 sparc - currently setup as Storage Server with Puredisk deduplication pool (over two years in use)

New media server Solaris 10 sparc

I am attempting to create a Storage Server on a new media server in order to create a Puredisk deduplication pool.  I am using the Storage Server Configuration Wizard.  When the wizard attempts to create the Storage Server it fails with "Creating storage server <media server name> cannot connect on socket(25)".   However the bpclntcmd command from master to media and media to master shows connectivity.  Each server can ping the other.   The new media server is showing correctly under Media Servers in the Administration Console.  The bptestbpcd command also shows connectivity between the two servers.  Not sure what I am missing.

Comments 12 CommentsJump to latest comment

randes2000's picture

bp.conf file on master and new media servers have correct SERVER, MEDIA_SERVER, and CLIENT_NAME entries.

mph999's picture

Is the media server configed with a short name but the other servers are using FQDN, or vice-versa ?

Do any of the servers (master or the media) have multiple NICs, or a 'bonded' NIC ?

Do you have the correct licenses installed ?

 

 

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
huanglao2002's picture

You can check  admin logs for help.

/usr/openv/netbackup/logs/admin/XX.XXX

mph999's picture

Logs - you would probaby need some of the vx logs as well, from memory (so no promises until I check) 202, which logs into 222 and 230.

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
revaroo's picture

Check the licenses on both the working/non-working servers and compare.

Marianne's picture

Double check comms - What is returned for new media server when you run these commands on the master?

nbemmcmd -listhosts -verbose

nbemmcmd -getemmserver

bptestbpcd -host <new-media-server> -verbose -debug

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

randes2000's picture

Thank you all for your help.  All licenses on the master server were applied to this new media server.  All server names are short names, no FQDNs.  The master server has two NICs.  The new media server has one NIC.  On all my other servers, connectivity between servers and clients is over a private (non-routed) network.  Connectivity to this media server is over our public network because this media server is at our new data center, which is being built (local site).  I checked with our networking team and they said that no port is being blocked between sites.  The new media server is listening on ports 13724 and 1556.  We designate the hostname of our servers on our private network with a "-2" at the end of the hostname.

The output of the suggested commands are below:

We are a dark site, so I cannot provide the output of the nbemmcmd -listhots -verbose, but the master server and all my media servers are listed, including the new media server. No errors.

Same with the nbemmcmd -getemmserver command, but the master server and all my media servers are listed, including the new media server. No errors.

bptestbpcd -host new-media-2 -verbose -debug
08:30:07.835 [12353] <2> bptestbpcd: VERBOSE = 0
08:30:07.875 [12353] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:30:07.875 [12353] <2> logconnections: BPCD CONNECT FROM 192.168.114.144.40889 TO 10.112.5.7.1556 fd = 4
08:30:07.898 [12353] <2> vnet_pbxConnect: pbxConnectEx Succeeded
08:30:07.915 [12353] <8> do_pbx_service: [vnet_connect.c:2108] via PBX VNETD CONNECT FROM 192.168.114.144.40890 TO 10.112.5.7.1556 fd = 5
08:30:07.915 [12353] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:443] VN_REQUEST_CONNECT_FORWARD_SOCKET 10 0xa
08:30:07.972 [12353] <8> vnet_vnetd_connect_forward_socket_begin: [vnet_vnetd.c:460] ipc_string /tmp/vnet-09074381149157078651000000094-wBaqUr
1 1 1
192.168.114.144:40889 -> 10.112.5.7:1556
192.168.114.144:40890 -> 10.112.5.7:1556
08:30:08.073 [12353] <8> file_to_cache_item: [vnet_addrinfo.c:6555] fopen() failed ERRNO=2 FILE=/usr/openv/var/host_cache/1ff/ffffffff+vnetd,1,8,0,2,0+.txt
08:30:08.091 [12353] <2> bpcr_get_peername_rqst: Server peername length = 5
08:30:08.110 [12353] <2> bpcr_get_hostname_rqst: Server hostname length = 10
08:30:08.127 [12353] <2> bpcr_get_clientname_rqst: Server clientname length = 12
08:30:08.143 [12353] <2> bpcr_get_version_rqst: bpcd version: 07500004
08:30:08.160 [12353] <2> bpcr_get_platform_rqst: Server platform length = 9
08:30:08.177 [12353] <2> bpcr_get_version_rqst: bpcd version: 07500004
08:30:08.194 [12353] <2> bpcr_patch_version_rqst: theRest == > <
08:30:08.210 [12353] <2> bpcr_get_version_rqst: bpcd version: 07500004
08:30:08.267 [12353] <2> bpcr_patch_version_rqst: theRest == > <
08:30:08.283 [12353] <2> bpcr_get_version_rqst: bpcd version: 07500004
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1583] in failed file cache ERR=2 NAME=new-media SVC=NULL
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1273] vnet_cached_getaddrinfo_and_update() failed 6 0x6
08:30:08.320 [12353] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2832] vnet_cached_getaddrinfo() failed STAT=6 RV=2 NAME1=new-media
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1530] in failed cache ERR=2 NAME=new-media SVC=NULL
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1273] vnet_cached_getaddrinfo_and_update() failed 6 0x6
08:30:08.320 [12353] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2832] vnet_cached_getaddrinfo() failed STAT=6 RV=2 NAME1=new-media
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1530] in failed cache ERR=2 NAME=new-media SVC=NULL
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1273] vnet_cached_getaddrinfo_and_update() failed 6 0x6
08:30:08.320 [12353] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2832] vnet_cached_getaddrinfo() failed STAT=6 RV=2 NAME1=new-media
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1530] in failed cache ERR=2 NAME=new-media SVC=NULL
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1273] vnet_cached_getaddrinfo_and_update() failed 6 0x6
08:30:08.320 [12353] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2832] vnet_cached_getaddrinfo() failed STAT=6 RV=2 NAME1=new-media
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1530] in failed cache ERR=2 NAME=new-media SVC=NULL
08:30:08.320 [12353] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1273] vnet_cached_getaddrinfo_and_update() failed 6 0x6
08:30:08.320 [12353] <8> vnet_same_host_and_update: [vnet_addrinfo.c:2832] vnet_cached_getaddrinfo() failed STAT=6 RV=2 NAME1=new-media
PEER_NAME = master
HOST_NAME = new-media
CLIENT_NAME = new-media-2
VERSION = 0x07500004
PLATFORM = solaris10
PATCH_VERSION = 7.5.0.4
SERVER_PATCH_VERSION = 7.5.0.4
MASTER_SERVER = master
EMM_SERVER = master-2
NB_MACHINE_TYPE = CLIENT
<2>bptestbpcd: EXIT status = 0
08:30:08.343 [12353] <2> bptestbpcd: EXIT status = 0

randes2000's picture

I made some progress, although I'm not really sure how.  I was able to create the Storage Server on the new media server (just went through the wizard again).  The wizard completed the Storage Server creation, and then launched the Disk Pool Configuration Wizard.  The last time I created a Disk Pool for a Storage Server, this wizard took me through the process, allowing me to chose the file system on the server to be used for the disk pool.  This time, after chosing Puredisk as the type of disk pool to create, and then selecting my Storage Server from the list, I get no choices in the Select volumes block.  At this point I can go no further.  Not sure where I would have created the volume on this media server to move forward.

The Netbackup Deduplication Guide just says to use the wizard and follow the wizard instructions.  Not very helpful if you have an issue.

watsons's picture

I remember having a similar issue a while back, but it failed at the "creating disk pool" stage and had a different error message so I am not sure how helpful it is.

The error was:

"database system error - RDSM has encountered an STS Error: faled to update storage server configuration due to unsupported platfom, invalid configuration or system error. "

It was a Windows media server, the only one having this error error and the rest (of other media servers) were all setting up fine. Not much difference between this affected one and the other system except the hostname. At the end, it was strange the following steps resolved the issue.

- We enabled nbrmms (OID=222) logs, it was showing a file xxxxx.cfg (where xxx is the hostname) was missing in \bin\ost-plugins\  (for Unix, this will be /usr/openv/lib/ost-plugins)
- We checked other media servers that was working and it has the config file.
- Given that a non-MSDP media server will not have that file, we are sure somewhere in the "create storage server" wizard it creates that file, and if that file fails to create, subsequent retry would fail continuously.
- At the end, we uninstall Netbackup from that media server, and reinstall it, thereafter everything went well and we could create the storage server, disk pool & disk volume we want.
 

watsons's picture

One more note I found but this was not Puredisk pool, just AdvancedDisk. The error was:

The wizard is not able to obtain Storage servers information , cannot connect on socket 
RDSM has encountered an issue with STS CORBA exception: getDiskVolumeInfoList 

As in this case, media server is the storage server, and a "bpclntcmd -pn" returns nothing. So we checked on the hosts file, copy the hosts file from another working media server and disable IPv6 setting. It seems the 127.0.0.1 is pointing to other hostname instead of localhost, we changed that as well.

After restarting Netbackup on it, everything starts to work again. 

Hope it helps.

randes2000's picture

watsons,

  I have seen the RDSM error on a home system I was playing with.  Wasn't aware that Solaris 10x86 would not do deduplication.  I went with Oracle unbreakable Linux on a vm.  Anyway, your descriptions do not quite fit what I am seeing.  Thanks.

girishsj@symantec's picture

Try bptestnetconn utility to test the "connection" between master and media server. 

Reference: http://www.symantec.com/docs/TECH205471