Login to participate
Security IdeasRSS
10

Add alerting for CPU and disk utilization

jmock's picture
12 Agree, 2 Disagree
Status:
In Review

We have had several instances where are scanners were inoperable without any notification.  In all cases the CPU was at 100% or the root file system was 100% utilized.  Threshold monitoring of these two items would be greatly appreciated.

phhowe17's picture

Have you looked at using SNMP monitoring?

We are using SNMP and feed the data to our alerting systems.  We are monitoring outbound/deferred queues, memory, load average (we can't find CPU - the SNMP counter doesn't seem to be updated),  disk usage (root, data, opt), and network traffic.

imagebrowser image

Ian McShane's picture

Reviewed.

Thanks, this request has been noted.

Please do continue to add further details as necessary. 

phhowe17's picture

See Case 411-245-831

We have been monitoring

ssCPURaw[Nice|System|User|Idle].  It appears you have integer overflow on these gauges.

To correctly plot the various ssCPURaw values, you need to compute the difference between current and previous, sum these differences and then divide each by the sum to get % utilization for each gauge.

It appears when the counter gets to 2^32-1 you don't correctly wrap to zero. Restarting the SNMP service doesn't fix this, but rebooting the box does. Also, Tech Support doesn't want to support this monitoring since the on-box MIBS don't include these counters.  However they are documented in document #  2008010311490354 SNMP OIDs and description that can be queried on Symantec Brightmail Gateway appliances

AdnanH's picture

How long your system had been

How long your system had been up before the counter reached its max value?

Can you not use ssCpuUser, ssCpuSystem, ssCpuIdle instead of their raw counterparts?  I guess these are not prone to overflow.

Regards,

Adnan