Advice: How to tell if EV server is "over-populated"?
Running EV 2007 sp4 - Exch 2003 sp2
Running on an HP DL380, dual-xeon 3.0ghz, 4gb ram.
WIndows 2k3 R2 - sp1
dual teamed 1gbit nics.
Vault store is on CIFS share on Netapp SAN.
indexes are on local drive.
SQL 2005 is on the local drives and backed up to SAN.
SQL reporting services running and have a single "scheduled" job just after the archive window closes.
2701 vaults.
Strange, strange server behavior over the past few weeks outside of the archive window. Trying to get a grip on whether or not we've exceeded the "capacity" of the server. Performance numbers during non-archive time are minimal. She gets really busy during archive times - 1000 items 5 threads. Server has been running since mid 2008, and implemented 11/2008. No real problems until about a month ago, when it was locked up? You could ping it and run management mmc from a remote machine (could stop services okay), but RDP wouldn't answer...couldn't login via KVM (hung after entering credentials)...only ILO allowed restart. Absolutely NOTHING showing in any of the logs. My gut was that she was maxed out...too busy to answer. This has happened multiple times, with "odd" differences. One time, no MAPI was working to the Exch server (same subnet) but clients could still open archives. I was able to remote to the server okay. Today, RPC traffic worked from another machine in the same subnet but not from different subnets. Opening archives failed, trying to map to admin share failed from different subnets. In EVERY case, restarting clears it up until the next time. Since this is "off archive times", I don't think that a dtrace would help.
Ran HP diags, which showed nothing, so this is why I'm posting.
QUESTION: Can you get to a point where the vault population is "too much" for the hardware? The most concurrent Exchange users I've seen is about 300 or so...and many don't go after archives much...they're still not sure what they are!!
Now, I do not manage the SQL database...that's done by another team member, so I don't currently have the insight into what I should look for in SQL. I've read about locked processes and will look further into that.
I've turned on the EV perf counters along with server counters in hopes that I'll get some evidence of something when next this occurs, but I thought that many of you out there might have had such a fight and might have learned something.
I appreciate any insight.
Thanks.
Comments
are you running SQL 2005 and
are you running SQL 2005 and SRS on your EV server?
Andy Becker | Authorized Symantec Consultant | Trace3 | Symantec Platinum Partner | www.trace3.com
Yes. Logs on D: and databases
Yes. Logs on D: and databases on E:
Here's something you could
Here's something you could have your DBA do to see if SQL is hanging the box.
Start (and let run) SQL Profiler:
- Open SQL Server Profiler
- Select File then New Trace
- Use a Blank Template and expand the 'Locks' Events under "Events Selection" tab.
- Select Lock:Deadlocks and Lock:Deadlock Chains
- Run
Also, from a Query Analyzer prompt:
- Type the following in the Query window:
DBCC TRACEON (3605,-1)
DBCC TRACEON (1204,-1)
- Press F5 to execute the script.
This should enable detailed logging to the error logs which they can review for you.
Andy Becker | Authorized Symantec Consultant | Trace3 | Symantec Platinum Partner | www.trace3.com
Will do. At this point, I'm
Will do.
At this point, I'm game for most anything.
But, if we're "outside" of archive time, isn't the database just read only? I ask because we've turned off manual (client) archiving and not "advertised" the restore (copy to) functionality in AE, so most all access is via Outlook 2k3/2k7 with EV add-ins to open archived items. No big searches...THAT I'M AWARE OF!
Thanks.
The database is not read only
The database is not read only no but I see where you are going with this. If you have disallowed manual archiving then yes outside of the archiving windows no items will be archived. You have other operations though that can be run which do edit databases so the databases are not in Read Only. Operations such as:-
1. Collections/Migrations Run if configured
2. Monitoring
3. Delete Operations
4. OWA Extensions. Have you switched off the ability to archive and delete there
With regards to number of users you state you have 2701 archives which is well under what 1 server should be able to handle. Not that we talk in users anymore as the bottleneck is around how many items an hour you need to process and how many you are actually getting.
If the latter is smaller than the former then you are going to get backlog problems. A good sign of backlog problems is if you still have items left on your A5 MSMQ when you come out of your archiving schedule as this indicates that there are still items to be processed. So in this case you would need to look at extending your archiving schedule or just accept this fact.
The fact that your server is hanging is strange and may not be an EV issue. Do you have a DR server that you could building blocks over to and see if you get the same problems. Otherwise things to take into consideration are:-
1. Anything in server logs at all
2. Does this happen around the same time/day?
3. As suggested check for deadlocks but SQL deadlocks should not really hang your server.
4. Performance counters as you say
5. Maybe run a chkdsk to check for disk issues
EV Backline Technical Support Engineer APJ Region
Agreed.
I agree that my circumstances seem to be non-EV related. I've garnered some good info in these forums since my adventure with EV began, and your input bears that up.
For the record:
1. Anything in server logs at all
---absolutely nothing. In the EV log the last entry was at 4 a.m. (just prior to the end of the window). The next was after my restart. System was the usual WMI Performance chatter. Application was at 8:00 this morning, group policy then the restart late in the afternoon.
2. Does this happen around the same time/day?
--nope. so far, random...but outside of archiving window. I walked in at 7:00 a.m. one morning (two hours after the end) and couldn't get to the server to check things out. With nothing in the logs, i can't be exactly sure when it happened.
3. As suggested check for deadlocks but SQL deadlocks should not really hang your server.
--QUESTION: Would deadlocks just affect response times?
4. Performance counters as you say
--checking on those...
5. Maybe run a chkdsk to check for disk issues
--another good suggestion that I haven't gotten to just yet.
The subnet thing today had me going...talking with the networking guys. There's no firewall involved, and it "appeared" that something was interfering BUT after it dawned on me that a restart cleared it all up...NEVER MIND.
I'll take your suggestions to heart and will make a point of posting anything I trip over...in hopes that it might prove useful.
Thanks.
Oh, and...YES
I switched off the manual archive feature in OWA...but based on what I could find I had to leave the restore feature turned on so that the OWA fetch-to-hidden-pst function would work. Anyway, I'd be very surprised if my users are restoring. I just hid the Restore icons/buttons...because they were SURE that their old emails were going to be removed forever and would make haste to put them back...vicious circle.
Thanks.
I would suggest what Paul
I would suggest what Paul said,
Also please take a look at my latest blog:
https://www-secure.symantec.com/connect/blogs/micr...
That will help you with MSMQ...
www.quadrotech-it.com - All your EV Tools
Queues are cleared
Although I have come in with messages backed up, but I attributed that to processing large messages, which your blog seems to validate. The last four days they have all been at zero. There were two "hang ups" during that time.
But, kudos to you for the document and query!
Thanks.
FOLLOWUP
Since my previous posts, the old girl has been up and flawless for 6 days! (JINXED!!!!)
I did three things:
1. stopped any manual archiving. Although, I will have to resurrect this eventually.
2. per Paul's suggestion, ran a CHKDSK...but no errors were found
3. "fine tuned" the anti-virus exclusion list per Symantec doc 284807.
For #3, I had excluded indexes (local D: drive) and vault stores are on a CIFS share, but not the message queue folder, the Shopping folder (have restore options disabled), PST temp folder (not being used), EV temp folder, Export folder.
Now, AV has been running on this server since Dec, and the first "stall" happened in May...about the time I started manual archiving.
No conclusions drawn from all this...just passing it along.
SQL reporting services?
after a 17 or so day hiatus, the "event" happened again. it's as if the server...which is pingable...is so busy it cannot process requests to open archives. you can remote to it and enter credentials, but nothing happens afterwards.
I was able to remotely stop all the EV services and SQL...but not the SQL reporting service. A reboot is required to correct it.
I'm off now to try and find out how to troubleshoot reporting services...if not just turn them off for a while. I use it to run a scheduled job after each morning's archiving finishes.
Just updating this thread.
Thanks.
Bobby Just as a note: It's
Bobby
Just as a note:
It's best practice to have SQL on a seperate box. This could be another existing SQL Server.
As for your issue... that's probably not that easy to remediate.
You could try to run a "pslist" remotely, to see which processes are running, and is maybe behaving abnormally.
Also, as you said, performance counters for CPU, Memory, and so on could prove valuable.
Cheers
Michel
www.quadrotech-it.com - All your EV Tools | www.techfreak.ch
Reporting service question
Michel,
I hear what you're saying about a separate box...but unfortunately, right now, I'm stuck as is.
I have a single scheduled SQL report at 6:45, and then a batch file scheduled to run at 7:00 on the server. The 7:00 job sends me an email...which I didn't get that particular morning.
Nosing around, I found the following in a reportingservices log file on that date, and about the time I had to restart:
ReportingServicesService!servicecontroller!1a!7/4/2009-08:52:03:: i INFO: RPC Server stopped
ReportingServicesService!servicecontroller!1a!7/4/2009-08:52:19:: e ERROR: Can't unload domain, trying again
(there were about 7 more of the can't unload domain entries and then nothing...until the next log was started when the server restart occurred)
A quick BING... ;-) ... showed a few hits on reporting services hanging, but no real solutions.
QUESTION: Is the reporting service optional in terms of EV functionality? Can I disable it?
Thanks in advance.
You only need it is you want
You only need it is you want the ev reporting functionality but apart from that no
EV Backline Technical Support Engineer APJ Region
We had the exact issue with EV Server freezing
Hi Bobby,
We had a single EV server with SQL installed locally on the same server and gradually the server became slower and slower as the load of the server increased. And we started experiencing similar issues as what you described above. (Unable to RDP, user searches timeouts/very slow etc).
A month ago I migrated the SQL Server to a different dedicated server and I haven's seen the lockups (Touch wood) since then and EV performance for the end user searches has improved heaps.
Apparently SQL Server on the same server as the EV server is only for demo/pilot environments as per Symantec. Please refer to the below link.
http://seer.entsupport.symantec.com/docs/314694.htm
Also SQL Server Reporting seems to be a killer on the SQL Server.
Cheers,
Would you like to reply?
Login or Register to post your comment.