Video Screencast Help
Security Community Blog

The Causes of Data Breach

Created: 07 Oct 2008 • Updated: 02 Mar 2009
Kevin Rowney's picture
0 0 Votes
Login to vote

The Verizon 2008 Data Breach Investigation Supplemental Report just came out and the results shed new light on the real causes of data breach.  It's difficult to get hard evidence on what's behind the high and rising breach rates globally so when new substantial in-depth analysis comes out, I pay careful attention.

In past posts, I talked about methodological problems with other survey methods and tried to resolve the inconsistencies between what is publicly reported and what we (the DLP Division of Symantec) see on enterprise networks.

In short, I believe this report contains evidence backing the case that many data breach risks are in fact preventable conditions using DLP.  Since this very tightly aligns with our (substantial) anecdotal evidence from the field, it's pretty exciting to see confirmation of these same trends from a third party.

First off, Where Did this Data Come From?

Verizon ran a four-year study across hundreds of breach investigations collecting a swath of crucial details capturing the underlying reasons for the loss of data.  This supplemental report and the original release of the analysis are both based on "500 forensic engagements handled by the Verizon Business Investigative Response team."  If you want to know about the causes of data breach, a good place to look would be the post-mortem analysis of breached enterprises and this report provides that data at length.  Admittedly, there is no ‘control group' in this study; nor are there any longitudinal or prospective components of this study that would compare the quality of defensive practices over time and between various organizations.

Having said that, it's still a really impressive piece of work and the security community should be grateful for this hard data that's very much in keeping with "New School" thinking that seeks to quantify and rationalize the practice of security.  The breadth and rigor of this study is impressive.

Takeaway #1: Partners and insiders represent the highest risk

I was disappointed in much of the early blog postings and press that covered this report who seemed to think Verizon's analysis finally "punctured the myth" that insider threats were more serious than the problem of hackers.  In this updated analysis, clearly the Verizon team is trying to emphasize and elaborate on the original report findings that basically said no such thing.  Evidently many critics didn't read the original report in detail but in case there's any remaining doubt, the Supplemental now asserts plainly: "Though data breaches are more likely to originate outside the organization, insiders tend to cause larger breaches."

In a further elaboration of the results from the original report, the Supplemental summarized the aggregate harm created by various classes of threat agents via a "Simplified Risk Calculation" on page 5.  The three classes of threat agents in this report are "External," "Internal," and "Partner."  A concise summary of that table follows:



Notice the trends that in each of the major industry verticals.  In each vertical, the top ranked origin of risk is "Partner" or "Internal" and in most categories "External" (i.e. hackers) rates last.  My read of this chart is that the origin of data breach risks are primarily an outcome of poorly managed risks around the use and abuse of data by partners and insiders. In fact, although hackers represent a higher count on the number of incidents, the aggregate harm (in terms of record count) is substantially higher from insiders and partners.

In my opinion, there is an undue emphasis on the problem of hackers as the origin of the problem of data breach.  No doubt, externally driven sources of a breach are a problem; but if you are managing the risks around data breach, this report indicates that the top-of-mind concern should be what's happening inside your LAN.  Your first concern likely out to be what your employees and partners are doing with your most critical data.

DLP solutions answer exactly these questions.  Content aware detection and blocking across a range of possible routes of egress of the data is a core DLP value proposition.  In my reading of this report, it seems clear that DLP solutions could do a massive amount of good in reducing the primary causative risks behind data breach.

Takeaway #2: Data stored online is the primary target of breach

More interesting data is found in Table 7.:  "Compromised Assets (Percentage of Records)".  There you see evidence that backs a claim I've made previously on this blog: smash and grab laptop theft is an over-reported form of breach that represents low levels of risk compared to exposure of data stored on the LAN.  The percentage of "compromised records" (not publicly reported breaches) reported by vertical are Financial (74%), Food (98%), Retail (87%), and Tech (73%) respectively. Clearly online data represents the vast majority of the forms of compromised records dwarfing the asset types of "end-user devices," "offline data," and "networks and devices."

Remember, the methodology behind this report is a summary of the circumstances where a known breach was perceived to have caused significant enough harm that a full forensics investigation team was retained.  In these cases the online assets are clearly the primary target. 

What you should take away from this if you are trying to manage risks of data breach is that online data assets are (unsurprisingly) the most common target.

Takeaway #3: The Most Common Cause of Data Breach is the "Unknown unknowns"

The Supplemental report cites, on p. 20, a key theme using the famous quote from Donald Rumsfeld around the danger of "unknown unknowns."        

"Nine out of 10 data breaches involved one of the following:

+ A system unknown to the organization (or business group affected)

+ A system storing data that the organization did not know existed on that system

+ A system that had unknown network connections or accessibility

+ A system that had unknown accounts or privileges

We refer to these recurring situations as "unknown unknowns," and they appear to be the Achilles heel in the data protection efforts of every organization."

This pretty much exactly reflects our experience in DLP risk assessments.  Every organization where we've run a risk assessment is surprised by how widely spread these ‘unknown unknowns' have proliferated throughout their organization.  In fact, Symantec DLP is about the best tool on the market at finding the number one most dangerous 'unknown unknown': finding sensitive data that the organization did not know existed on that system.

It's for this reason we believe DLP systems play such a crucial role in reducing the overall risk of data breach.  If - in nearly seven out of every ten cases - data breach events involve data exposure events that could have been remediated by DLP; it seems fair to claim that a very large percentage of the current data breach problems are in fact preventable conditions.

Tying it all Together

In summary, this study indicates the generic profile of high risk data breaches is:

* Committed by partners or insiders.

* Perpetrated against data stored online.

* In a system on your own LAN where you had no idea it was stored.

DLP systems specialize in the detection of precisely these kinds of events.  DLP platforms offer detection of exposure of confidential data in all its manifold forms (on rogue servers in the local LAN, in email sent to partners, in channels exfiltrated by hackers, on thumb drives copied over by malicious insiders, and on and on...) This report by Verizon indicates that these protections are a critical form of risk management that no enterprise can no longer afford to ignore.

It's my hope that as we see Data Loss Prevention systems deployed more widely, we can slow down or even roll back the wave of data breach events that now dominate the headlines in the security trade press.

Kevin Rowney