Video Screencast Help

Network Shares Stop Responding Randomly on Windows Server 2008 R2

Created: 16 Jun 2014 • Updated: 16 Jun 2014 | 35 comments
roger_silva's picture

Dear friends, we are running Symantec Endpoint Protection 12.1 on Windows Server 2008 R2 File Server but, randomly, the network shares hangs and the network applications stop responding.

The current configuration is:

  • Symantec Endpoint Protection
    • ​Enabled Features: File System Antivirus only
    • Version: 12.1.4100.4126
    • Exceptions: Correctly defined to exclude important folders, files and processes from any kind of scan
    • Scan: Only modified files are being scanned
  • Windows Server 2008 R2 x64 with Service Pack 1
    • Hyper-V Enabled
    • File Server Resource Management is Enabled
    • Symantec Endpoint Protection Manager is Installed (ASA database is defragmented)
  • Server: IBM System x3550 M4
    • 32GB RAM
    • 2 Processors
    • RAID Volumes: RAID 1 - Operating System, RAID 5 - User Files, RAID 1 - Virtual Machines (all disks are 6Gbps SAS)

Beside the slow access to the network shares, when the server is restarted the server stucks during shut down. A hard shutdown is needed.

When Symantec Endpoint Protection is removed from server, the problem does not happen.

Right now, after a reboot on last saturday - 14th june, 2014 -, the server is fully functional and I'm waiting for the problem to happen again to try to use Process Monitor capture server activity and confirm that the problem is being caused by SEP (http://blogs.technet.com/b/markrussinovich/archive...). I also enabled VPDebug Logging (http://www.symantec.com/business/support/index?pag...) to check if SEP shows some sort of error message on it's working log.

Any help will be appreciated.

Best regards to everyone! 

Operating Systems:

Comments 35 CommentsJump to latest comment

.Brian's picture

Disable only the SEP firewall and try again

other thing you'll need to do is call support, they will have you run the symhelp tool and re-produce the issue. It will collect the info and send to support. You will also need to run a packet trace while re-producing the issue.

Please click the "Mark as solution" link at bottom left on the post that best answers your question. This will benefit admins looking for a solution to the same problem.

Grshpr75's picture

We are having this EXACT same issue right now and have been fighting this.  We have gone so far as to create a VM dump for Microsoft and have a case open with Symantec as well.  We are not running the SEP firewall, just default windows firewall.

Microsoft suggested a hotfix specifically for the srv2.sys driver, but we tried that on of our failing servers late last week and it just crashed tonight.

One thing you will find on these servers I bet as we did, if you revert to SMB1 shares (connect to shares from XP or Win2K3) you'll find that all the shares are there an still work.  This seems to just be limited to SMB2 shares.

 

If we find anything out I will post here a well.

.Brian's picture

This has been an issue in the past with SEP. I can remember once related to the firewall for sure and auto-protect beig another.

Good route on calling Symantec, they should be able to get the root cause, however, it would be a code fix most likely to truely correct the issue.

Please click the "Mark as solution" link at bottom left on the post that best answers your question. This will benefit admins looking for a solution to the same problem.

markcloud's picture

We have had this EXACT same problem, we have logged support tickets with Symantec to no avail. They want to reproduce the problem, but we cannot it is random.

We have had the problem occur on 2 different Windows 2008 R2 clusters running the file services role. And a standalone 2008 R2 server as well.

We have a near identical setup to Roger above in terms of Symantec version and Windows OS. We are also ONLY running the AV scanning features. The only difference is that we are running VMWare instead of Hyper-V.

Due to the critical nature of our file services clusters and the fact that it happened 3 times on us on one of the clusters, we temporarily uninstalled Symantec and the problem has not occured since.

We noticed the problem right after installing the latest version, which for us was a small version upgrade from what we were running before.

GadJeff's picture

Mark, yeah we went from 12.1.4013.4013 to 12.1.4100.4126 last Sunday during our production server windows patching (staged/pushed install from the SEPM for upgrade - restart needed) and started having the issue w/ SMB2 shares on most Windows 2008 R2 file servers.

SMLatCST's picture

Can I ask what changes you guys have tried in your troubleshooting?

As it's been narrowed down to the AV only component, have you tested:

  • enabling/disabling the network scan option within Auto-Protect, and the further options it contains (i.e. trust remote auto-protect and the network caching)
  • disabling all unscheduled scans (active scan on def updates, file cache scans, etc)

What I find interesting is the different environments.  The OP's setup suggests this is happening on a physical Hyper-V VM Host (please correct me if I'm wrong), while markcloud's description suggests his file server is a VMWare virtual guest.

I'll be keen to see how this progresses.

GadJeff's picture

We've sent a full memory dump to Microsoft when the issue occurred. From that memory dump on an affected Windows 2008 R2 server, they were able to determine that "...Most of the srv2 worker threads are waiting for an oplock. Hence new SMB2 request from the clients are not being served...It looks like a deadlock is in the symantec driver SRTSP64.sys..".

Again, we only went from v12.1.4013.4013 to v12.1.4100.4126 and started having these issues. There was no changes made in SEPM policies, settings, etc. Just a slight version upgrade in the client.

In the interm, we've done one of the following:
1. Stop the SMC/service on the server.
2. Remove the SEP client entirely.
3. Run CleanWipe, restart, install pervious version of 12.1.4013.4013.

We are only running the 'Basic Install Feature Set' (Virus, Spyware, and Basic Download Protection) features on our Windows servers.

markcloud's picture

As a note about Hyper-V vs VMWare:

Our Windows Clusters are on VMWare and we have the issue arise there.

Our standalone Windows 2008 R2 machine was a physical server and we saw the issue there as well.

We tried disabling auto-protect, def update scans, cache scans, etc. We thought that was the problem early on due to high I/O of definition files during the outages, but after disabling all of the above, we conitnued to see the problem, therefore we did a full uninstall of the SEP client on our critical file clusters (not happy about doing that, but we saw no other option).

Also we've noticed the problem as soon as 8 hours after a reboot on a very lightly used server (maybe like 5 SMB sessions opened) as well as on a heavily used server thousands of SMB connections. So it doesn't appear to be load/uptimed related as was another one of our theories.

roger_silva's picture

Dear friends, before saying anything, I'd like to thank you for your opinions.

In a quite similar situation as Jeff, we upgraded from version 12.1.4013.4013 to 12.1.4100.4126 and started having these issues.

The server spent almost a month without Symantec Endpoint Protection installed and nothing happened. The server worked very well during this period.

I define every single best practice documented by Symantec and other software developers (Microsoft and Computer Associates) to avoid performance problems. For example:

  1. Important folders and files from operating system and applications are removed from all types of scan.
  2. Network scan is disabled on file servers and workstations.
  3. Antivirus file cache is disabled on file servers.
  4. Only file system antivirus is enabled on servers.
  5. Only modified files are scanned.

As I told you, since last saturday it didn't happen again.

We are now waiting for the problem to happen to use the procedures I told you on the beginning of this thread. With the Procmon Analysis in my hands proving that the problem is being caused by some sort of Symantec file system driver - which due to the tests made I have no doubt, I'll be able to call Symantec and open a case with information in hands for their analysis.

I'll keep you informed about any updates.

Best regards for everyone!

Rogerio Renato da Silva
CompuNext

roger_silva's picture

Dear friends, I have found the following article at Microsoft Support Site: http://support.microsoft.com/kb/2582112/en-us .This article says that Windows Server 2008 R2 and Windows Seven operating systems stop responding during high network I/O SMB operations.

Details about it: this hotfix is from 2011(!!!).

I believe that if Symantec is having some sort of issue on its file system driver, together with this Windows Platform issue it can increase the consequences of high network I/O at the operating system side and, so, making things worse.

Last week we uninstalled SEP 12.1.4a from file server and installed Microsoft Security Essentials. On last saturday, 08/23/2014, we experienced a very quick outage (5 or 10 seconds) on file server.

I applied this hotfix, as it does not come with any Service Pack or Update or Update Rollup (it's not completely tested yet).

I'll keep everyone informed if I have any update about this issue.

Best regards!

Rogerio Renato da Silva
CompuNext

roger_silva's picture

We didn't observe if it happens during definition updates. frown

Rogerio Renato da Silva
CompuNext

jason_ossi's picture

Just seeing this post and checking to see if anyone has an update on this.  We are in the EXACT same scenario with multiple file and RDS servers and I have gone through 3 months of this and have suffered DFS corruption from dirty shutdowns and shares going offline (millions of files).  We just opened a call with Microsoft, and they're suggesting memory dumps etc - but it looks like someone else has gone through this rodeo.  We're going to open up a case with Symantec and ask to immediately escalate.  Thanks in advance...

glenn.ward's picture

We are also having the same issues, I have removed 12.1.4100.4126 from one server to test and we are not having the issue.

What I am looking at doing is, push the old version of the client back out (12.1.4013.4013). The only concern I have is going backwards, I have not done this before. Anyone have input on this?

Symantec – do you have a fix for this?

.Brian's picture

As long as the newer version is alrady removed, it should be as easy as installing the "old" version.

Please click the "Mark as solution" link at bottom left on the post that best answers your question. This will benefit admins looking for a solution to the same problem.

jason_ossi's picture

Glenn - regarding the 'downgrade' - we have been working with a Symantec engineer, and they are suggesting that maybe this was a result of upgrading from a previous version.  For this reason, they're suggesting we cleanwipe, then install the SAME version (4126).  Although I hate being the guinnea pig.....  I don't know that I've had any success with doing an inline downgrade....

jason_ossi's picture

Gotcha - not looking forward to this....  we have about 800 nodes with SEP, including about 150 servers.  While the file share issue is only affecting our core file servers, the 'shutting down' hang-up is spread througout our servers and affects things randomly.  Servers not coming up from updates, etc....  I'm going to have to script out clean-wipe then push the old version.  Just a lot of work for something that is a clear bug and we'd expect it to be fixed....

Grshpr75's picture

See GadJeff notes above, he is a co-worker of mine and our primary SEP administrator (and when I use 'We' for the rest of this it was all him doing set-up and testing).

We automated the process with SCCM, used the cleanwipe utility to remove SEP and do initial reboot.  Then let it clean-up left over files and install the older version and a final reboot.  I think we rolled back about 700 servers in total over 3 weekends.

We had about a 5% fail rate on the automated process.  Most of those were due to SCCM not knowing the removal steps finished out, and maybe 10 servers where the host would not reboot due to the SEP sevices not stopping properly and had to hard reset the machine.

warmachinerox's picture

Hi,

We also have the SAME ISSUE with Windows Server 2008 R2. Share is intermittent and we cannot replicate the issue since it is "intermittent". Shares, printers become inaccessible, you can RDP, ping, and telnet 445 service but you cannot browse the shares. we've been sending logs to the support but it seems like they havent figured out anything yet for 2 months now. We are running version 12.1.4100.4126 without the Firewall component and created basic package as the AV component only but still occuring randomly.

We have removed SEP client temporarily from the file servers that are having issues.

Has symantec support found anything yet?

GadJeff's picture

No, Symantec has yet to recreate the issue in their 'own' lab (to my knowledge). I keep insisting that they do so. We (as well as yourself and other customers...) have removed that version and reverted to back to v12.1.4013.4013.

warmachinerox's picture

Thanks for the response GadJeff. I will try the lower version (12.1.4013.4013) since we have totally removed sep client (cleanwipe) to the server that is having issues.

glenn.ward's picture

I have upgrade to 12.1.4112.4156.105 and my file servers have stop having this issue.

but,

on one of my networks the new version was pushed to the clients, but it did not uninstall the older version (12.1.3001.165.105) - Oh the joy...

SMLatCST's picture

How do you mean it did not uninstall it?

By default SEP upgrade put all the files for each version in its own folder (titled 12.1.3001.165.105 and 12.1.4112.4156.105 in your case) and swaps round the active files (i.e. the ones that SEP is running on) during a reboot.

Only after this mandatory reboot and swap over is the new version up and running.  With the older versions files now free, SEP will evetually delete the old files when the machine is free (this can take a few hours, but the 12.1.3001.165.105 folder will eventually be deleted).

Assuming the upgrade and swap over didn't bomb out, you just need to wait...

glenn.ward's picture

on some servers I see both version folders, and after mulit-reboots, the new version is still not installing.

SMLatCST's picture

If the swap over itself is not happening, then it won't free up the older versions' files for deletion.

I'd recommend checking the article below on the logs for troubleshooting the client install:

http://www.symantec.com/docs/TECH164067

glenn.ward's picture

thanks for the info, but it did not help. when i look at the properties of the client with SEPM, i see that 'Deployment targer version' is 12.1.4112.4156 and the 'Deployment running version' is 12.1.3001.165.  the only install packages that are on the SEMP server is 12.1.4112.4156 - i have removed 12.1.3001.165.

jason_ossi's picture

Hello, All - So we've been going back and forth, trying to get a response from Symantec.  I saw the notes about the 'b' version, 12.1.4112.4156 - we've downloaded it and have begun testing.  We called into Engineering in Symantec and asked for release notes for this version to see if the share/shutdown hangup was identified in this version - they told us that there were NO release notes and pointed us to a page that referenced this upgrade as a vulnerability fix.  All that said, I have the smoking gun: specifically for our hung up on shuttind down issue, Microsoft has analyzed a memory dump and has clrealy identified Symantec to be the issue:

Debug summary:

SRTSP64.sys is preventing the computer from shutting down 

To say I'm frustrated would be an understatement.  We have now traced this issue back to failings in our file servers, DFS (because of dirty shutdown), fax server (zetafax), document management server (Worldox), site file servers, and more.  This has literally caused us to spend hundreds upon hundreds of hours troubleshooting with vendors only to find it was Symantec.  I'm passing this on to our case manager asking for a supervisor to provide a response if we cannot get an engineer.

We're ready to move forward with the new version, but would like some kind of acknowledgement that this has been resolved.

glenn.ward's picture

Good luck with that, i have been banging my head against the wall of "Symantec" about this issue.

glenn.ward's picture

IT's BACK -

I have installed 12.1.4112.4156 (2 weeks ago) and along with other ongoing issues, we are back at the same issue of Network Shares Stop Responding Randomly on Windows Server 2008, I have tried to removed 12.1.4112.4156 from one server to test but having issues removing SEP, so i am getting HBSS to push McAfee A/V to the server to see if this will uninstall SEP. then to watch if the Network Shares Stop Responding.

Symantec – do you have a fix for this? or will my fix be HBSS???????

roger_silva's picture

Dear friends, I found a Microsoft Article (KB2582112 - http://support.microsoft.com/kb/2582112), which reports that Windows may stop responding on High Network I/O operations through SMB shares.

I believe that if SEP has some sort of issue on its file system driver, maybe this Windows problem can be turning things worse, and probably this hotfix must be needed to be applied on the server and workstations to solve part of the problem.

This hotfix does not come on any Service Pack, update or update rollup from Microsoft.

We removed Symantec Antivirus from the file server last week. Today it stopped responding, but was a very fast outage (5 seconds - not as long as with Symantec installed).

I applied this hotfix and we will be monitoring server to check if it is going to happen again.

Best regards to everyone!

Rogerio Renato da Silva
CompuNext

warmachinerox's picture

Dear All,

i have tried to install this KB2582112 to one of our FS, however the issue re-occured after 2 weeks. does any one had a fix already?

I think my open case at symantec support will just age..(nothings happening, they want me to replicate the issue. <crap>)

Don @ F&amp;M's picture

We have been having this same problem with version 12.1.4100.4126. I happened to notice by pure coencidence one day when this was happening on a particular server that a virus definition update was being sent out by our management server. I've limited the live updates of the management server to a period at night now and it seems to have improved things. I'm still waiting on it to happen again.

Invitel's picture

We are having the same issue. On two heavily used file server the shares starting to slow down, and they eventually stop working. the server not showing anything in the resource explorer, no IO/memory or CPU usage, the shares just stopped working.

KB2582112 did not solve this, neither a restart, because after about an hour the shares slow down and stops again.

Another experience is the server stucks in the "Shutting down..." phase forever, until manual reset.

 

Rolled back to 4013.4013, all of the problems went away.

jason_ossi's picture

Update - I got an Symantec engineer on the phone and provided them with a copy of the dump and the diagnosis from Microsoft.  He mentioned that this week is jammed because of staff cuts and some training, and that we should expect to hear back next week.  At this point, we've basically identified the issue - they just need to fix it.