Endpoint Protection Small Business Edition

 View Only
Expand all | Collapse all

Excessive SEPM Database I/O after a few weeks

  • 1.  Excessive SEPM Database I/O after a few weeks

    Posted Feb 15, 2015 02:35 AM

    Hi Everyone

    I'm running a network of 15 machines and have SEP 12.1 RU5 Small Business Edition on a Windows Server 2012 R2 Essentials.

    Every couple of weeks, the SEPM Embedded Database goes I/O crazy and hammers my server's disks.  It's not got a great RAID controller so long periods of heavy I/O tend to make the network shares eventually bottleneck too :(

    The symptoms are little CPU activity, but 100% I/O disk activity with high read/write on the files C:\Users\SQLANYs_sem5\AppData\Local\Temp\sqla0000.tmp and C:\Program Files (x86)\Symantec\Symantec Endpoint Protection Manager\db\sem5.log.  Both files are upward of 500Mb in size and for all I know has been hammering for hours before being reported by users.  It might settle itself, but who knows how long it would take.

    To work around the problem I do the following;

    1. Stop the SEPM Service
    2. Kill the SQL Anywhere Network Server (dbsrv16.exe) process as the Service cannot be stopped gracefully. 
    3. Start the Engine manually with dbeng16.exe -m "C:\Program Files (x86)\Symantec\Symantec Endpoint Protection Manager\db\sem5.db" and it starts cleanly and truncates the transaction log sem5.log back to a practical size.
    4. Shutdown the manually started engine in step 3.
    5. Delete the sqla0000.tmp file.
    6. Start the Symantec Embedded Database service
    7. Start the SEPM service

    By now, everything has returned to normal with less than average 15% disk utilisation. 

    Presumably this is a problem with Liveupdate updating the repository.  I think this because shortly after I will see in the console log that LU has performed a run and completed.  The files will then have quickly grown to exactly the same size again, but the database will behave nicely and not cause strain on the disk and will keep going unattended for a few weeks.  Eventually (after I think 4 hours) the transaction log will shrink to normal too.

    So anyone have similar experiences before I contact tech support?  I have seen articles where similars issues have affected old versions but nothing relevant.  I am thinking of doing a nightly restart of the two services, but I am not convinced it will fix the problem if LU is involved. 

    Marty

     



  • 2.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Feb 17, 2015 02:21 PM

    It could be an issue when definitions are being updated. May want to engage support though since you're at the latest version.



  • 3.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Feb 24, 2015 05:21 AM

    I have noticed something similar. Our EP Manager was using sustained 1000+ IOPS (On a 40 client database). Any progress/updates here? Or should we open a case on our own?

    Thanks



  • 4.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Feb 24, 2015 07:58 AM

    Probably best opening a case, especially if at the latest version. Could be a few different causes and they'll need to enable some advanced logging and go through those logs.



  • 5.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Feb 24, 2015 05:42 PM

    Hi,

    I am also seeing the same issue - ridiculous amounts of IO on the DB process during definition updates.  This started after updating to 12.1 RU5.  I suspect it might also be same issue as is happening at https://www-secure.symantec.com/connect/forums/sep-1215-question-about-content-optimization.

    Did anyone else get anywhere with this?  Or should I open a case with support?

    Phil



  • 6.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Feb 25, 2015 04:43 AM

    Phil

    I am also seeing the same issue - ridiculous amounts of IO on the DB process during definition updates.  This started after updating to 12.1 RU5.  I suspect it might also be same issue as is happening at https://www-secure.symantec.com/connect/forums/sep....

    I had a look at that thread and it sounds exactly like what's going on, but sadly seems unsolved.  Interestingly I also have a Dell server too like they mention in that thread.  I wonder if it is common to other people having this problem?  Do you have the problem every time LiveUpdate runs?  I suspect it is limited to a particular type of update.

    My server has 6 cores, so the idea of adjusting the maximum cores with scm.content.incoming.delta.cpu.limit=1 is an interesting thought and makes some sense on the effects on I/O, particularly on a host with no write-back cache and does not expect to do a lot of I/O (which usually includes a Domain Controller), so I might try it.  Would like to hear if others have positive results doing this.  SEPM only requires Pentium 4 processor and is not hard on CPU, so effects on a machine with ordinary disks should be positive.

    I haven't seen the problem since posting this thread, and haven't logged a support case either yet (I find Symantec support a massive waste of time), but I would suggest if it is impacting you frequently then definitely log a case. 

    It seems like RU5 is the common denominator.  Rumor has it RU6 is coming soon (anyone know when?) so I might wait or find a less resource intensive competitor product.  My site is very small (15 hosts) and I'm considering just using client LiveUpdate which would not impact downloads significantly.  In the meantime I have changed my LiveUpdate window to after-hours for 4 hours max to at least minimise the potential impact on client network access.  A once-daily update may not suit your site size and policy, so use discretion.

     



  • 7.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Mar 01, 2015 05:03 AM

    I've noticed absolutely no difference in setting scm.content.incoming.delta.cpu.limit=1.  In fact I still see multiple CPUs being used so I am not sure what exactly what process is throttled by this.

    I took a closer look at my logs and I can see that LiveUpdate is "hanging" for around 20-40 minutes every time an update is run and succeeds in finding an AV definition update, sticking on a line: Deleting Property for Moniker pretty much every time.  I suspect it is LU waiting on the database processing to finish before moving on to the next which happens immediately with no hang.

    I ran LU manually to see what was going on and as expected during the update I am at 100% disk utilisation and mass of data going through SQL Anywhere.  But the key to my problem I think is that sometimes during the my Disk Queue Length will jump to over 100 for a couple of seconds!  I am not sure if this is because of hardware or bad programming, but I believe this is when I am getting network client problems (ie disk wait time is too long). 

    30 minutes is too long for a single update to process (note I said process, not download) regardless of how long the disk queue length stays over 100 (even 10 is worrysome).  I thought more frequent LU checking might mean less definitions being processed all in one go, but concluded that it was pointless since the log showed it "hung" at one line before moving onto the next, meaning any update has a potential to hang if a new def is waiting.

    From what I read online, a number of people have issue with RU5 and the embedded database SQL Anywhere v16 (up from v12), and we are just another victim of it.  I couldnt say for sure since I had RU4 on an old server and this new server went direct to RU5. Instead of downgrading I am going to log a service call.  In the meantime my answer is running LU between 1:30am and 4:30am.  Nobody is online at this time and so no risk of network client issues.  If they don't solve it without a downgrade or promise a solution in RU6, I'm going to a competitor's product to regain my network client reliability.



  • 8.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Apr 17, 2015 06:05 AM

    After waiting a week for someone to contact me after opening a case, then doing multiple log captures and a 5Gb xperf log capture (not easy to upload this much data), it's now been 4 weeks since opening the case and still haven't recieved acknowledgement that they even see an issue!

    I feel like I've wasted my time doing data gathering for them.  The latest contact was for more information, but didn't give any indication what they needed (I can't get on the phone at the time they want, and I fail to see how me sitting at the console will help reveal anything more that isn't in a log).

    So I've given up.  I've 3 weeks left on my license and I'm switching to a cloud-managed console.  Sadly they don't offer a crossgrade discount, just a non-profit licensing band which only saves $5 a license.  Competitor product is looking better anyway because I've been majorly unimpressed by the quality of both the products and tech support from Symantec.

    For what it's worth, I think it's specifically the embedded DB engine causing the issue, as I never noticed a problem with 12.1 RU4 (RU5 included an new engine version) and workaround involves tinkering the engine.  RU6 might solve it, and if it comes out in the next 2 weeks I'l give it one last chance to dazzle me.

    Good luck anyone else, I hope you find an answer :)



  • 9.  RE: Excessive SEPM Database I/O after a few weeks

    Posted Apr 17, 2015 05:11 PM

    Hi Marty,

    My experience has almost exactly mirrored yours.  My maintenance is up in 4 weeks and I'm looking at alternatives.  Regardless of if it is resolved with a patch, the poor product quality and support in this product in this and other occasions made my the decision to move to another product very easy.

    Phil



  • 10.  RE: Excessive SEPM Database I/O after a few weeks

    Posted May 23, 2015 08:06 AM

    I don't think we should hope for a solution from Symantec anytime in near future. This problem has been going on for years, and it is clear to me that Symantec is either unable or unwilling to solve it.

    I am running a SBS 2011 with 11 users, and Symantec is consuming more resources that some business applications are using for 200 users. This is simply ridicolous.

    The database for 11 users is reaching 8GB of disk space, for WHAT? Temp file is firing up taking up to 1.5GB of disk space, for WHAT? This is simply bad programming.