Video Screencast Help

SQL jobs randomly failing and locking databases (7.5.0.1)

Created: 18 May 2012 • Updated: 20 Feb 2014 | 21 comments
This issue has been solved. See solution.

I have a single sql policy with 5 sql servers in the job.  Since the upgrade to 7.5.0.1 i have noticed that almost everyday 1 or 2 of these servers has a stream that is stuck on active for a single database. When this happens the database on the sql server then locks becuase it says that a pending backup is running.  You cannot detach, take offline, update or run a manaul sql backup when this happens.

in the job i am seeing this same error for each database that has the failures:

5/17/2012 6:43:27 PM - Info dbclient(pid=1588) ERR - Error in VxBSACreateObject: 3.      
5/17/2012 6:43:27 PM - Info dbclient(pid=1588)     CONTINUATION: - System detected error, operation aborted. 
5/17/2012 6:43:28 PM - Info dbclient(pid=1588) ERR - Error in CPipeServer::CreateAgentMetadata: 6.      
5/17/2012 6:43:28 PM - Info dbclient(pid=1588)     CONTINUATION: - The system cannot find the file specified.
5/17/2012 6:43:28 PM - Info dbclient(pid=1588) ERR - Error in VxBSACreateObject: 6.      
5/17/2012 6:43:28 PM - Info dbclient(pid=1588)     CONTINUATION: - The handle used to associate this call with a previous VxBSAInit() call is invalid.

To correct the problem i have to cancel the jobs and then go and stop the dbbackex.exe process from task manager before the database will become fully functional again. 

I am then able to run a normal backup but this doesnt correct it permanently.  It may be a day or two and the issue is back with a different database.

Any ideas on what could be the issue? 

Comments 21 CommentsJump to latest comment

Mark_Solutions's picture

Are your SQL Clients also at V7.5.0.1? I have seen issues in the past with mixed versions.

Is there anything logged in the Application or System event logs on the client?

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Stanleyj's picture

The clients are also 7.5.0.1 and the event logs do not show any error messages until i maually cancel the job.  At that point it starts reporting sqlserver and sqlvdi errors.

BackupIoRequest::ReportIoError: write failure on backup device 'VNBU0-3876-5104-1337293838'. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

SQLVDI: Loc=WaitForResource. Desc=Partner process aborted. ErrorCode=(0). Process=1604. Thread=1332. Server. Instance=MSSQLSERVER. VD=Global\VNBU0-3140-2288-1337293839_SQLVDIMemoryName_0.

This is really strange because once i manually cancel the job and then stop the dbbackex.exe the backups run fine until the next random episode. 

Mark_Solutions's picture

Do you have any Anti Virus on your SQL Servers?

If so have you excluded the NetBackup processes - it may be AV locking NBU locking SQL

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Stanleyj's picture

None of our sql machines have antivirus.  I guess a ticket is my next step.

It strange because this weekend 3 out of 5 sql servers had jobs from saturday that were stuck this morning but also had jobs that kicked off on sunday that ran just fine. 

Here is something else. out of 25 or so policies ranging from system state, exchange, sharepoint and file servers only my sql jobs are showing errors.    

mlockwood's picture

Experiencing similar issues here.  I wasn't sure why my SQL backups were failing until i discovered recently that there where a dozen or so dbbackex.exe processes running on the clients left as remnants of failed/killed jobs.  When i killed these processes, backups proceeded as expected.  I've been trying to track down the cause of the failures (other than the ones killed by admin) and also the reason why, when a job is killed, the process remains running on the client, preventing any further backups from succeeding.  Our master/media is at 7.5.0.1 and we have a mix of clients on 7.5 and 7.5.0.1.  We are working on upgrading all to 7.5.0.1.

NB_Martin's picture

Same errors with 7.5.0

In a job with 70 databases, the transaction log backup fails absolutely without a template. Sometimes run all backups ok, sometimes fails one, sometimes fails two or more.

The database with the problem stays with three sql process actives, two in state SUSPENDED and one RUNNEABLE.

spid ecid status loginame hostname blk dbname cmd request_id
190 0 suspended                      NT AUTHORITY\SYSTEM                                                                                                                                  0     The_Database BACKUP LOG       0
190 1 runnable                                                                                                                                                         0     The_Database BACKUP LOG       0
190 2 suspended                                                                                                                                                        0     The_Database BACKUP LOG       0

I have to kill the SQL Server Process ID and the backup in that time fails with the error, in that case execute kill 190.

Then NB shows

......

INF - Created VDI object for SQL Server instance <INSTANCE>. Connection timeout is <300> seconds.
15:19:39.731 [7432.6928] <4> : 15:19:39 INF - Server status = 25
ERR - Error in VxBSACreateObject: 3.
    CONTINUATION: - System detected error, operation aborted.
ERR - Error in CPipeServer::CreateAgentMetadata: 6.
    CONTINUATION: - The system cannot find the file specified.
ERR - Error in VxBSACreateObject: 6.
    CONTINUATION: - The handle used to associate this call with a previous VxBSAInit() call is invalid.
 

Moreover, sometimes it produce a DOS attack, taking the CPU at 100%. This happened two times, in one on they i can see the process bpclntcmd having 3 instances taking 25% of CPU each of it. Other process, bpcd, have the other 25%. SQL Server server stop receive requirements because this.

All name resolutions are ok, i reinstall the clients in SQL Servers, and nothing changes.

Regards

Wiriadi Wangsa's picture

Hi Guys,

You might want to call NetBackup Tech Support and obtain the hotfix. Just mention E-track 2813345.

Currently the hotfix is only available for NetBackup client version 7.5.0.1, so if you are at 7.5GA you will need to install 7.5.0.1 patch first. 

Hope it helps.

Stanleyj's picture

I recieved the hotfix from support last week and yes it did correct the dbbackex.exe service from hanging but now im getting error (13) "File read failed" randomly on databases.

One day it will be database A and then the next it will be database B. 

We tried running a native sql backup on the databases that would fail and that seems to correct it for a couple of days and then the 13's start showing back up again.

vholmes's picture

I am having this same issue, this has been going on weeks without a resolution.  Has anyone else experienced this?  I am looking to consolidate efforts to get to a permanent resolution to this issue.

Marianne's picture

Have you seen this post above? https://www-secure.symantec.com/connect/forums/sql-jobs-randomly-failing-and-locking-databases-7501#comment-7464631

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Stanleyj's picture

The eeb did fix the issue with locking the databases.  But now instead of locking the database it just skips it.  So we fix one issue only to now need to troubleshoot another.

On the bright side atleast only a couple of databases are failing instead of almost all of them.

Just to let you know the trouble now is i am recieving error 13s (file read failed) and at times error 1.  And of course this is randomly happening.  Every day its a toss up as to which server and database it might be.

Support seems to beleive its a timeout issue between the media server and the client but we have adjusted the time outs and its still happening.

mtsheposm's picture

Hi everyone

 i am experiencing error 6 and 2 in my new environmnet (NBU 7.5.0.3 windows2003/8, SQL Db backups) dbbackex hanging on the client side sometimes it reboots the production servers when trying to kill the process, i have applied the NB_7.5.0.3 SQL EEB client agent form support on my SQL servers and no luck.

And all of the SQL DB are on full recovery mode

anyone with the solution for this issue please help i have two servers that was last backed-up on the 18/08/2012 and the other on the 29/08/2012.

one of the servers

012/09/06 06:44:17 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full or incremental database backup can be performed on database <A2i_xCat_DBs> because it uses the simple recovery model or has 'truncate log on checkpoint' set.
2012/09/06 06:44:19 AM - Info dbclient(pid=4688) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 5(5) seconds.
2012/09/06 06:44:32 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full or incremental database backup can be performed on database <MDMCATALOGUE_Z000> because it uses the simple recovery model or has 'truncate log on checkpoint' set.
2012/09/06 06:44:34 AM - Info dbclient(pid=4688) INF - OPERATION #2 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 3(3) seconds.
2012/09/06 06:44:48 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full or incremental database backup can be performed on database <MDMCATALOGUE_m000> because it uses the simple recovery model or has 'truncate log on checkpoint' set.
2012/09/06 06:44:50 AM - Info dbclient(pid=4688) INF - OPERATION #3 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 3(3) seconds.
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) INF - BACKUP STARTED USING       
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) Microsoft SQL Server 2005 - 9.00.4035.00 (X64)     
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) Nov 24 2008 16:17:31        
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) Copyright (c) 1988-2005 Microsoft Corporation       
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) Batch = C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch, Op# = 4     
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) INF - Using backup image C75FSRM003.MSSQL7.C75FSRM003.trx.SRP.~.7.001of001.20120906065308..C      
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) INF - backup log "SRP" to VIRTUAL_DEVICE='VNBU0-4688-4540-1346907189' with  stats = 10, blocksize = 65536, maxtransfersize = 4194304, buffercount = 2
2012/09/06 06:45:03 AM - Info dbclient(pid=4688) INF - Number of stripes: 1, Number of buffers per stripe 2.
2012/09/06 06:45:04 AM - Info dbclient(pid=4688) INF - Created VDI object for SQL Server instance <C75FSRM003>. Connection timeout is <300> seconds.
2012/09/06 06:50:04 AM - Info dbclient(pid=4688) ERR - Error in GetConfiguration: 0x80770003.      
2012/09/06 06:50:04 AM - Info dbclient(pid=4688)     CONTINUATION: - The api was waiting and the timeout interval had elapsed.
2012/09/06 06:50:04 AM - Info dbclient(pid=4688) DBMS MSG - ODBC return code <-1>, SQL State <37000>, SQL Message <4214><[Microsoft][SQL Native Client][SQL Server]BACKUP LOG cannot be performed because there is no current database backup.>.
2012/09/06 06:50:04 AM - Info dbclient(pid=4688) DBMS MSG - SQL Message <3013><[Microsoft][SQL Native Client][SQL Server]BACKUP LOG is terminating abnormally.>
2012/09/06 06:50:04 AM - Info dbclient(pid=4688) ERR - Error found executing <backup log "SRP" to VIRTUAL_DEVICE='VNBU0-4688-4540-1346907189' with  stats = 10, blocksize = 65536, maxtransfersize = 4194304, buffercount = 2>.
2012/09/06 06:50:06 AM - Info dbclient(pid=4688) ERR - Error in VDS->Close: 0x80770004.      
2012/09/06 06:50:06 AM - Info dbclient(pid=4688)     CONTINUATION: - An abort request is preventing anything except termination actions.
2012/09/06 06:50:06 AM - Info dbclient(pid=4688) INF - OPERATION #4 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 305(305) seconds.
2012/09/06 06:50:20 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full backup can be performed on the master database.
2012/09/06 06:50:22 AM - Info dbclient(pid=4688) INF - OPERATION #5 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 3(3) seconds.
2012/09/06 06:50:35 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full or incremental database backup can be performed on database <model> because it uses the simple recovery model or has 'truncate log on checkpoint' set.
2012/09/06 06:50:37 AM - Info dbclient(pid=4688) INF - OPERATION #6 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 3(3) seconds.
2012/09/06 06:50:50 AM - Info dbclient(pid=4688) USER - Operation inhibited by NetBackup for Microsoft SQL Server: Only a full or incremental database backup can be performed on database <msdb> because it uses the simple recovery model or has 'truncate log on checkpoint' set.
2012/09/06 06:50:52 AM - Info dbclient(pid=4688) INF - OPERATION #7 of batch C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch FAILED with STATUS 1 (0 is normal). Elapsed time = 3(3) seconds.
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - Results of executing <C:\Program Files\Veritas\NetBackup\dbext\mssql\temp\SAPSQLTransaction[0].bch1346907079[0].bch>:     
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) <0> operations succeeded. <7> operations failed.      
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - The following object(s) were not backed up successfully.  
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - A2i_xCat_DBs         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - MDMCATALOGUE_Z000         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - MDMCATALOGUE_m000         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - SRP         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - master         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - model         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - msdb         
2012/09/06 06:50:55 AM - Info dbclient(pid=4688) INF - This batch was run 2 times and resulted in 0 successful backup(s) and 7 failure(s) after repeated attempts.

Thank you

Stanleyj's picture

Sorry buddy but have have been working with support for almost 2 months on this and I stil get error 83, 6, 2, and 1 almost every single day.  Just this morning i sent 300mb worth of logs to them. 

Sorry to say this but atleast i know that upgrading to 7.5.0.3 does not resolve the problem because i have been wondering if it would.

As you can see there are LOTS of people having this very same issue and i am ashamed to admit that i have been putting up with this for 2 months.  A fix should have been released by now.

The eeb did resolve my issue with dbbackex.exe so i dont know if thats a 7.5.0.1 thing or not. 

i will definitly update this post as soon as support provides me with something.

Dyneshia's picture

Please open a case with support and reference etrack 2846239

Please mark this as solution if this resolves your issue.

Stanleyj's picture

Does anyone have any new findings to report? 

Because I have been instructed by support that i need to call microsoft to have them look into performance tuning my sql instances to try and resolve the error 6's. 

I find it odd that i went from no sql failures to every sql server haveing failures and performance tuning is what i should do to fix it.

I did opened the microsoft ticket this morning so i will report back whatever they suggest.

TheCricket's picture

I'm seeing the same issue.  I'll be opening a call with Symantec likely next Monday or Tuesday.  Sounds like 7.5.0.3 patch resolves some of the errors but not all of them?  Is that correct?

I'm still running Server 7.0 - this is the only client I've taken to 7.5.

Dyneshia's picture

For 7.5.0.3 please open a case with support and reference etrack 2846239.

Please mark as solution if this resolves your issue.

TheCricket's picture

Dyneshia,

Is the 7.5.0.3 you are referring to the client version or the Server version?  I'm currently on Server 7.0

with Client 7.5.0.1.

Thank you!

Dyneshia's picture

You are stating that your Master server is 7.0 ??  And the cleint is 7.5.0.1 ?

If so you muct upgrade your Master server to at least 7.5 .  Prefrebly 7.5.0.1 however the base install of 7.5 will work.

If you mean your Master server is 7.5 then yes you can have your SQL server at 7.5.0.1 or 7.5.0.3.  When I referred 7.5.0.3 I was reffering to the client.

Stanleyj's picture

Quick update. 

I was one of the lucky/unlucky ones that upgraded my 5200 appliance to the 2.5.1 code before it was pulled from release.  This along with upgrading to 7.5.0.4 seems to have corrected all issues with sql jobs failing that we have described here when the appliance is functioning properly.  So once i get the stability issue worked out with the appliance I will comeback and update if the sql failures are truly corrected after upgrading to the most current releases. 

SOLUTION
TheCricket's picture

I went ahead and upgraded the server to 7.5.0.4 and the client to 7.5.0.4 and this resolved my issues.  Thank you!