Socket Write Failed on RHEL 5 Client
I've been troubleshooting the backup errors on a RHEL 5 client for a bit, and am receiving an error 24, socket write error. Initially, we were receiving a status 57 error, but remedied that by installing xinetd and starting the service, then reinstalling the NBU client software. All the services that need to run such as xinetd, bpcd, and bpjava-msvc are running, and don't really know where to go from there. Everything that I've done was immediately following the initial install, it has never worked, however, with the same configuation on my workstation (ip tables are correct as well) everything works fine.
ps -ef | grep xinetd
returns
root 3246 1 0 Mar18 ? 00:00:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
and all xinetd based services that are necessary show as on from a chkconfig --list
At this point after checking iptables, checking the running processes, and ensuring that the services are turned on, I don't know where to look next, so any help would be appreciated.
Comments
Have you checked out
the myriad tech notes regarding socket errors?
Before getting into the technotes, you'll want to enable verbose 5 logging by adding VERBOSE = 5 to the bp.conf file, and creating the following directories on the client in question:
<install_path>/openv/netbackup/logs/bpcd
<install_path>/openv/netbackup/logs/bpbkar
If you have not checked the t/n's out yet, below are some good places to start:
http://seer.entsupport.symantec.com/docs/347002.htm
http://seer.entsupport.symantec.com/docs/336452.htm
As soon as you get some logs, you can paste snippets here to help us determine exactly where and why socket connections are giving you trouble.
Will do
Thanks for the input
More to-do's
Let us know the NBU and OS version of your master server. Are there any media servers?
Run the following command and paste the output here, stripping away of course any IP's or hostnames, just replace with something that lets us know what is going on (I'm assuming your master is a UNIX variant, if not the same command resides on Windows master servers as well):
<install_path>/openv/netbackup/bin/admincmd/bptestbpcd -client <client_hostname> -verbose -debug
The output might help us out.
So here's what I've got so far
Master server is NBU 6.5, Red Hat 2.6, we've got 2 media servers. As for the output from <install_path>/openv/netbackup/bin/admincmd/bptestbpcd -client <client_hostname> -verbose -debug:
sdbckmstr[/usr/openv/netbackup/bin/admincmd]sudo ./bptestbpcd -client
xxxxxxxxxx -verbose -debug
16:26:59.439 [16523] <2> bptestbpcd: VERBOSE = 0
16:26:59.462 [16523] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2046:
VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
16:26:59.462 [16523] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2060:
service: bpcd
16:26:59.502 [16523] <2> logconnections: BPCD CONNECT FROM
xxx.xxx.xxx.xxx.xxxxx TO xxx.xxx.xxx.xxx13724
16:26:59.502 [16523] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.180:
msg: VNETD CONNECT FROM xxx.xxx.xxx.xxx.xxxxx TO xxx.xxx.xxx.xxx.13724
fd = 4
16:26:59.515 [16523] <2> vnet_vnetd_connect_forward_socket_begin:
vnet_vnetd.c.533: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
16:26:59.556 [16523] <2> vnet_vnetd_connect_forward_socket_begin:
vnet_vnetd.c.550: ipc_string: /tmp/vnet-18656270499306868983000000000-a9KUEo
16:26:59.596 [16523] <2> put_long: (11) network write() error:
Connection reset by peer (104); socket = 3
16:26:59.596 [16523] <2> bpcr_put_vnetd_forward_socket: put_string
/tmp/vnet-18656270499306868983000000000-a9KUEo failed: 104
16:26:59.596 [16523] <2> local_bpcr_connect:
bpcr_put_vnetd_forward_socket failed: 24
16:26:59.597 [16523] <2> ConnectToBPCD:
bpcd_connect_and_verify(hostname, hostname) failed: 24
<16>bptestbpcd main: Function ConnectToBPCD(hostname) failed: 24
16:26:59.597 [16523] <16> bptestbpcd main: Function
ConnectToBPCD(hostname) failed: 24
<2>bptestbpcd: socket write failed
16:26:59.597 [16523] <2> bptestbpcd: socket write failed
<2>bptestbpcd: EXIT status = 24
16:26:59.597 [16523] <2> bptestbpcd: EXIT status = 24
socket write failed
Also, both of our main NBU admins are out this week, so I, along with a couple of others, are helping out while they're gone, and my knowledge in this field is very limited, so I really appreciate the help.
SELinux may have caused the trouble
Take a look at this TechNote and see if it helps.
http://seer.entsupport.symantec.com/docs/337118.htm
You should have
some good logging after last night's backup run. Can you parse through them (bpcd primarily for now) and see if you see any errors?
Also have you done the usual name resolution checks?
From the client:
bpclntcmd -pn
If you have a bprd log directory enabled to the master, the output of this command and any errors will be logged there.
bpclntcmd -self
bpclntcmd -hn <hostname_of_master>
bpclntcmd -ip <ip_of_master>
From the master server:
bpclntcmd -hn <hostname_of_client>
bpclnt -ip <ip_of_client>
You can paste the results of those commands here as well if they return any errors.
I know I had an issue last week where someone had switched around the nsswitch.conf file on one of my clients, which resulted in backup failures. I didnt get socket errors, but sometimes these socket errors can be masked as other types of connection errors. Are you relying on DNS in this environment or host files? Check the nsswitch.conf file and make sure it is telling your client to resolve to the appropriate service first, either host files or DNS.
yes bprd
Also make sure bprd log is on master server. Try test backup with verbose 5 set on master server but careful logs grow quick so turn down after test
I'll check SELinux,
and if no joy there, I'll post some snippets of the commands that were suggested. Also, I'm not sure if this has any bearing, but we had an issue with DNS where the ip for the client in question was resolving to the wrong hostname. Our network technicians resolved the issue, and everything seems to be in order, but I'm wondering if that has anything to do with this.
I can't get to the client until after noon, but as soon as I can, I'll get all of the information that is needed to troubleshoot this further.
I'm wondering if that has anything to do with this.
Probably. Netbackup is very picky about name resolution - forward and back.
good Will backing-up
The bpclntcmd commands
will help determine if there is an issue with forward/reverse name resolution
Are you using
your hosts file at all?
Definitely not using the hosts file
It's all DNS
SELinux was the issue
I've disabled it for the time being and am getting a full backup. I appreciate all of the input.
You should mark CY's post
as the solution.
Would you like to reply?
Login or Register to post your comment.