Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

SQL Backups fail with status 2, 25 and 58. Filesystem backups OK

Created: 21 Oct 2010 | 23 comments

Hi all,

As the title suggest, we have a DB backup that has been failing systematically. The file system backup runs on but when we try and backup MSSQL DB we get a status 2 on the parent job and status 25 and 58 on spawned default backup jobs.

I've tested communication from both master/media to from client using bpclntcmd (-hn / -ip) and all resolves correctly.  bpgetconfig -M <client> -x returns correct information so I'm pretty sure it's not connection related.

Any ideas? Thanks in advance for any and all help.

Specs

Master: Windows Server 2003 with NB 6.5.5
Media: RHEL 5 with NB 6.5.5
Client: Windows Server 2003 with NB 6.5.5

 

Cheers,
PL

Comments 23 CommentsJump to latest comment

Riaan.Badenhorst's picture

Is your SQL DB running in a cluster? Its almost certainly network, we just need to figure out where. Can you telnet to bpcd from the master > client, media > client, Client > media, client > master. Try with both name and IP.

 

Do you have more than one NIC/IP configured on the client?

Regards,

Riaan Badenhorst

You need an OpenVision to see the truth about Backups. Restores are a plus. But that's just Semantics ;)

ITs easy :)

palobo's picture

Thanks Riaan for the quick reply.

The DB is not runnig in cluster. I've tried telnet from both master/ media <-> client using both name (hostname and FQDN) and IP and I can connect just fine.

As far as NICs go the client has just 1. Both master/media and client are going through a firewall directly with just 1 hop between them. 

Like I said before, the filesystem backup runs just fine and I can access the client properties on the on the "Host Properties" view.

Cheers,
PL

Riaan.Badenhorst's picture

Interesting, come to think about it, if the child jobs spawn it means that it connected, and then kicked of the default application streams.

 

Can you post the bpbrm logs from media server, bpcd and dbclient from the client. For a start....

Regards,

Riaan Badenhorst

You need an OpenVision to see the truth about Backups. Restores are a plus. But that's just Semantics ;)

ITs easy :)

Riaan.Badenhorst's picture

Did this work, and then start failing, or did it never work?

Regards,

Riaan Badenhorst

You need an OpenVision to see the truth about Backups. Restores are a plus. But that's just Semantics ;)

ITs easy :)

Marianne's picture

Please post the following output from client:

bpclntcmd -pn

Please ensure you have bprd log on master. check log for incoming request from client.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

MOHAMED PATEL's picture

From the client side:

connect to BAR gui and recheck your SQL login information -

If this post is helpful in any way, please vote or mark as solution.
A thumbs UP will do as well....
Regards
Mo

scerreti's picture

the problem was the nblog.conf file corrupted in the master server. Verify this file.

Sergio Cerreti
T-Systems Italia
Using Netbackup 7.0.1; Master server on SUNOS 5.10

 

MOHAMED PATEL's picture

@scerreti

just out of interest... were any modifications made to the nblog.conf file - (manually)

or were vxlog... commands run to perhaps change log settings...?

....trying to ascertain if this is a bug on NB7

If this post is helpful in any way, please vote or mark as solution.
A thumbs UP will do as well....
Regards
Mo

Zahid.Haseeb's picture

To narrow your problem, if possible, create a test DB in your SQL and try to take the backup.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

palobo's picture

Hi all and thanks for the replies so far.

Here's a quick update on my progress up till now. 

I'm not quite sure what happened over night but the filesystem backups that were previously working now give me a status 58 too. We've managed to see that after rebooting all works ok for a few days and then this behaviour creeps up.

@Riaan
Sorry can't get those logs for you because it's not connecting at all now and yesterday's logs were rotated out.

@Marianne
The output of bpclntcmd -pn on the client is:

 expecting response from server ecbck01.echiron.com
fcgssm01.gulbenkian.pt fcgssm01.gulbenkian.pt 10.11.11.60 4258and the bprd log on the 

master bprd log shows:

 09:05:49.067 [28932.29868] <2> logconnections: BPRD ACCEPT FROM 10.11.11.60.3047 TO 10.123.15.107.13724
09:05:49.067 [28932.29868] <2> process_request: setsockopt SO_LINGER on 416 succeeded.
09:05:49.067 [28932.29868] <2> connected_peer: Connection from host fcgssm01.gulbenkian.pt, 10.11.11.60, on non-reserved port 3047
09:05:49.067 [28932.29868] <2> db_valid_master_server: fcgssm01.gulbenkian.pt is not a valid server
09:05:49.176 [28932.29868] <2> process_request: command C_CLIENT_ID (45) received
09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
09:05:49.192 [28932.29868] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm
09:05:49.207 [28932.29868] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2468 TO 10.123.15.107.13724
09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
09:06:34.004 [18660.20680] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpdbm
09:06:34.004 [18660.20680] <2> logconnections: BPDBM CONNECT FROM 10.123.15.107.2549 TO 10.123.15.107.13724 

@Scerreti
I'm supposing nblog.conf is OK otherwise other backups on this master would fail in a similar manner. I tried vxlogview, vxlogmr and all work as expected.   

Trying bptestbpcd -verbose -debug -client fcgssm01 I get:

10:41:46.837 [30476.29600] <2> bptestbpcd: VERBOSE = 0
10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2048: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
10:41:46.884 [30476.29600] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2062: service: bpcd
10:41:47.056 [30476.29600] <2> logconnections: BPCD CONNECT FROM 10.123.15.107.1871 TO 10.11.11.60.13724
10:41:47.071 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1872 TO 1
0.11.11.60.13724 fd = 1784
10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:47.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:47.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1873 TO 1
0.11.11.60.13724 fd = 1784
10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:48.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:48.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1874 TO 1
0.11.11.60.13724 fd = 1784
10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:49.087 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:49.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:50.087 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.182: msg: VNETD CONNECT FROM 10.123.15.107.1880 TO 1
0.11.11.60.13724 fd = 1784
10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.186: errno: 10054 0x00002746
10:41:50.149 [30476.29600] <2> vnet_pop_byte: vnet.c.188: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_pop_string: vnet.c.268: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_pop_signed: vnet.c.312: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> version_connect: vnet_vnetd.c.1817: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.186: status: 43 0x0000002b
10:41:50.149 [30476.29600] <2> vnet_connect_to_vnetd: vnet_vnetd.c.152: Function failed: 43 0x0000002b
10:41:50.149 [30476.29600] <2> nb_vnetd_connect: comm.c.1887: vnet_connect_to_vnetd failed: 43
10:41:50.149 [30476.29600] <2> bpcr_vnetd_connect_forward_socket_begin: nb_vnetd_connect(10.11.11.60) failed: 25
10:41:50.149 [30476.29600] <2> local_bpcr_connect: bpcr_vnetd_connect_forward_socket_begin failed: 25
10:41:50.149 [30476.29600] <2> ConnectToBPCD: bpcd_connect_and_verify(fcgssm01, fcgssm01) failed: 25
<16>bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
10:41:50.149 [30476.29600] <16> bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
<2>bptestbpcd: cannot connect on socket
10:41:50.149 [30476.29600] <2> bptestbpcd: cannot connect on socket
<2>bptestbpcd: EXIT status = 25
10:41:50.149 [30476.29600] <2> bptestbpcd: EXIT status = 25

with bpcd log on client

10:41:46.622 [5856.3152] <2> bpcd main: offset to GMT 0
 10:41:46.622 [5856.3152] <2> bpcd main: Got socket for input 456
10:41:46.622 [5856.3152] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.1871 TO 10.11.11.60.13724
10:41:46.622 [5856.3152] <2> bpcd main: setup_sockopts complete
10:41:46.622 [5856.3152] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 1871
10:41:46.622 [5856.3152] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com
10:41:46.622 [5856.3152] <4> bpcd valid_server: hostname comparison succeeded
10:41:46.778 [5856.3152] <2> bpcd main: output socket port number = 1
10:41:49.653 [5856.3152] <2> get_long: (2) premature end of file (byte 1)
10:41:49.653 [5856.3152] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5
10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23
10:48:00.666 [7012.7984] <2> bpcd main: offset to GMT 0
10:48:00.666 [7012.7984] <2> bpcd main: Got socket for input 460
10:48:00.666 [7012.7984] <2> logconnections: BPCD ACCEPT FROM 10.123.15.107.2294 TO 10.11.11.60.13724
10:48:00.666 [7012.7984] <2> bpcd main: setup_sockopts complete
10:48:00.682 [7012.7984] <2> bpcd peer_hostname: Connection from host ecbck01.echiron.com (10.123.15.107) port 2294
10:48:00.682 [7012.7984] <2> bpcd valid_server: comparing ecbck01.echiron.com and ecbck01.echiron.com
10:48:00.682 [7012.7984] <4> bpcd valid_server: hostname comparison succeeded
10:48:00.838 [7012.7984] <2> bpcd main: output socket port number = 1
10:48:03.651 [7012.7984] <2> get_long: (2) premature end of file (byte 1)
10:48:03.651 [7012.7984] <2> get_vnetd_forward_socket: get_string ipc_string failed: 5
10:48:03.651 [7012.7984] <16> bpcd main: get_vnetd_forward_socket failed: 23 

Any ideas guys and gals?   Cheers, PL

MOHAMED PATEL's picture

There are network connection errors - have a look at technote

http://www.symantec.com/docs/TECH45382

 

10:41:50.149 [30476.29600] <16> bptestbpcd main: Function ConnectToBPCD(fcgssm01) failed: 25
<2>bptestbpcd: cannot connect on socket
10:41:50.149 [30476.29600] <2> bptestbpcd: cannot connect on socket
<2>bptestbpcd: EXIT status = 25
10:41:50.149 [30476.29600] <2> bptestbpcd: EXIT status = 25

If this post is helpful in any way, please vote or mark as solution.
A thumbs UP will do as well....
Regards
Mo

Venkatesh K's picture

Hi Pal,

check following,

  • verify client services are running on client, try restarting them again and test backup.
  • check for port bpcd and vnet open on client and ur able to connect on these ports
  • from master execute
  •               telnet client bpcd
  •               telnet client vnetd
  • try if ur able to resolve the hostname and IP
  •            
  • from master
  •  
  •         bpclntcmd -hn client
  •         bpclntcmd -ip <ip address of client>
    from client
  •         bpclntcmd –pn
  •         bpclntcmd -hn master
  •         bpclntcmd -ip <ip address of master>
  •  
  • to verify bpcd and vnetd are open run following from clienty
  • Netstat –a | grep bpcd                      
    Netstat –a | grep vnetd
  •  

 

Thanks & Regards, Venki Dont forget to Vote for + if you think suggestion is good, and mark as solution if it has resolved your problem.

Zahid.Haseeb's picture

 

10:41:49.653 [5856.3152] <16> bpcd main: get_vnetd_forward_socket failed: 23

 

 

 

See the below link. It might help you

http://www.symantec.com/business/support/index?page=content&id=TECH128319

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

palobo's picture

Hi all once again.

I managed to schedule some downtime at lunch and rebooted the client server. As I expected backups both filesystem and DB are working just fine now. If the current trend continues they will work fine for a few days and then just stop working with the errors stated so far.

 

I'm stumped I must admit. If it were a problem with communication a simple reboot wouldn't solve it so I'm guessing something more obscure is going on that I haven't managed to put my finger on it yet.

I'll keep you all update!

 

Thanks once again,
PL

Andy Welburn's picture

monitoring this client very closely over the next few days until it starts to fail as you envisage. Maybe resources getting exhausted for one reason or another (space? memory leak? nic errors?) until a point where NB can no longer perform? Don't just look at it from a NetBackup point of view, maybe there are other performance issues or failing components?

Just a suggestion....Good Luck!

Zahid.Haseeb's picture

No problem palobo. Hope it will be fixed :)

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

MOHAMED PATEL's picture

I would agree with Andy - to monitor other resources as well -

Have a look through the event log in the interim and check if anything stands out.

If this post is helpful in any way, please vote or mark as solution.
A thumbs UP will do as well....
Regards
Mo

Marianne's picture

The TN that Zahid posted yesterday is for a Linux client, but if one reads it carefully, it seems that a similar problem is experienced on this Windows client - 'something' was preventing socket setup or the client ran out of available socket connections. That's why the reboot worked.

This section in the TN seems very relevant (although there's no xinetd.conf on Windows):

********************************************

In this case, the xinetd.conf file on the client host was limiting the number of concurrent socket instances with other hosts.  The default value of 60 had been lowered to a value of 2.  

At least 3 concurrent sockets must be allowed per concurrent multiplex job.  

User-directed jobs require an additional socket for the server-bound request and two additional sockets if progress file updates are requested.

Jobs associated with DB2/Informix/Oracle/SAP/SQL-Server/Sybase/Teradata/XBSA-DataStore may utilize up to 10 concurrent sockets per concurrent job.

*********************************************

If the same problem returns, check output of 'netstat -a'.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

MOHAMED PATEL's picture

Had an issue with BMR restore previously, turned out to be a Windows problem after Symantec pointed to technote:

Network connectivity tuning to avoid network read/write failures and increase performance

http://www.symantec.com/docs/TECH60844

 

If this post is helpful in any way, please vote or mark as solution.
A thumbs UP will do as well....
Regards
Mo

palobo's picture

Hey all. Once again thanks for all the help so far.

 

As you may or may not have guessed, my problem is backup after a few days of working OK so I'm inclined to believe that something is wrong with this client. It's running out of sockets for some reason. I will be looking into this closely these next few days and I will report back as soon as I've got some info.

 

Cheers,

PL

Suryakiran21's picture

Did you check your registry entry , in my environment OS just works fine when it comes to SQL i have to make sure my registry is updated with correct client name whtever my DNS resolves to.

Cheers

SuryaKiran

Zahid.Haseeb's picture

HI Palobo

Any update ?

One more thing i would like to ask. Does this problem is with a specific client ? did your backups are going fine with other clients ?

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com