KRIs Stink, KRIs Save the Day
While managing Operational Risk for a large IT organization, one of my responsibilities was to work with Corporate Operational Risk to define Key Risk Indicators (KRIs) KRIs were monitored at a corporate level. We took the easy route by using canned reports that were already in production rather than taking the time to evaluate what may be useful to measure. We looked at things such as spam activity and external firewall activity. These KRIs provided very little value, as they were not actionable. If blocked spam activity went up or down, what could be done about it? If the firewalls were being scanned more frequently, was there much, if anything, we could do? When I speak with clients today about reporting and KRIs, I encourage them to measure and report on areas where action can be taken and is useful to the organization.
I recently dealt with a number of customers who experienced MAJOR Severity 1 issues. The impact and duration of the issues could have been drastically reduced if simple monitoring and reporting had been in production. For this exercise, let’s look at the endpoint protection environment, what can be measured and monitored with KRIs, how KRIs can help reduce major issues and alert you to other issues within your environment. The frequency to review KRIs and take actions to correct is an organizational preference, but I would recommend no less than monthly.
Metrics to Review
Viruses detected-Is there a virus activity with the environment that is being detected? Even if there is a small amount of activity you need to ask the question, “How is the malware getting in?” Email, web downloads, USB? Are there issues with other layers of protection are not working and should be addressed to stop this activity from entering your environment?
Definition age- How old are the definitions? Are there clients with definitions that are out of date by 3, 5, 7 days or longer? What is the root cause? Server issues? Connectivity issues? If you need to rapidly release definitions to your clients and they are not getting regularly updated, this will cause additional issues and may compound things in a time of crisis.
Versions of clients\ Features Enabled- Are all of your clients on a consistent version and feature set enabled? If not, why? If there is an outbreak, having inconsistent versions can lead to more confusion when trying to correct the issue.
Number of clients with protection running- Do the majority of your clients have endpoint protection installed and running? Compare the numbers from your report to things like active directory and look for the gaps. If there are major gaps, determine why then correct it.
When discussing Endpoint Protection Environments with customers, I typically make a couple of other recommendations outside the area of monitoring\KRIs. They have to do with the feature set that is enabled as well as the version of protection you are running. If you are not running the full feature set, you are severely limiting the protection provided and may experience issues. When was the last time you had a health check conducted on your Endpoint Protection Environments endpoint protection environment? Symantec offers this service that can identify issues before you are in an emergency situation.
In closing what I’ve mentioned are just a few of the many metrics\KRIs that can be monitored to instantly add value and potentially reduce or eliminate the severity of incidents within your environment. Are there other metrics or KRIs that you find value in monitoring? If so, I would love to hear about what you are monitoring.