VMware - Virtual Infrastructure Health Monitor Pack
Please find my Virtual Infrastructure Health monitor pack.
So what exactly does this Monitoring Policy do?
- Its first rule is a Metric Collect (no alerts) to gather N+1 statistics for memory and CPU, across any cluster managed by the vCenter server you target in the policy.
- You can easily turn alerting on based on how many times, consecutively, you surpass the N+1 threshold.
- You can add and remove clusters freely without having to modify this metric or script. New clusters will automatically be picked up and monitored.
- The equation used for determining the N+1 threshold is…
- 90% utilization of CPU or Memory on all “heads” in the cluster, minus 1 head.
- The remaining rules are self-explanatory, but to summarize, they monitor…
- ESX datastores for “percent provisioned” and “percent used”.
- I left my “Ticket creation script” in place for these datastore alerts, just to show you the data I gather for tickets. Also, this script fires another PowerCLI script on my Notification Server to gather LUN/Datastore info for the ticket, in a legible fashion.
- You will notice that the “Percent Provisioned” rule has a very high repeat count. This is necessary to account for nightly StorageAPI backups. That process will show LUNs as over provisioned.
- Host connection status, specifically “NotResponding”; no alerts for Maintenance Mode.
- The number of snapshot files, on a per VM basis (VMs that need consolidation).
- I have removed my ticket creation scripts from the previous 2 rules; therefore you will need to put something in place for notifications.
I use a custom inventory job and subsequent automation policy to monitor for “orphaned VMDKs”. (not included in this posting)
In this Monitor Pack, you will notice that it calls upon scripts stored on the vCenter Server. I believe I wrote those PowerCLI scripts on version 4.x, so they should work on 4 and above. I will not go into the detail of the scripts, I will however highlight the following…
- Please look into the Command Metrics and edit the path to wherever you place the scripts included in this posting.
- Keep the timeout and polling intervals high, as PowerCLI extensions take some time to load.
- This Monitoring Policy should apply to your vCenter server(s). It does not talk directly to any ESXi host.
Things you must change in the scripts…
- In the ticket creation script (A Task Server Job within the rule), at the very least…
- Your SQL Server name in place of “sqlalias001”
- Your path to the ESXLUNinfo.ps1 script in place of “e:\CustomScripts\ESXLUNinfo.ps1” (two places in the VBScript)
- Your Altiris server name in place of “AltirisSMSAlias” (two places in the VBScript)
- To and from addresses in the email subroutine and your SMTP server in place of “smtpmailAlias”
- In all of the PowerCLI scripts…
- Your vCenter server name in place of “vCenterServerAlias”
The following scripts must be placed on the following servers
- ESXLUNinfo.ps1 on your NS server
- All others on your vCenter server
If you have any questions, please feel free to post them here and I will do my best to respond.