In light of the controversy surrounding the release of classified U.S. military documents by the organization known as WikiLeaks, it’s not surprising that Data Loss Prevention (DLP) is top-of-mind for today’s Boards of Directors, CIOs, and CISOs.
As always, the challenge for today’s organizations is to manage their risk of data loss—particularly of intellectual property (IP), their most valuable information—and proactively prevent breaches. But to protect their IP, they must first know where it is. The problem is that many organizations lack visibility into where that information lives, what it looks like exactly, and how to separate it from non-sensitive data.
Continue reading to learn about a new DLP detection technology that helps organizations protect their IP from being leaked by malicious as well as well-meaning insiders.
Protecting sensitive information through deep content inspection and analysis using DLP is usually the first step to preventing data loss. Current DLP solutions rely on two classes of detection technologies:
- Describing technology protects confidential data by looking for matches to keywords, expressions or patterns, file type recognition, and other signature-based detection techniques.
- Fingerprinting technology works by looking for exact matches of whole or partial files. Data to be protected is first collected in a variety of formats (such as Microsoft Word files, Excel files, and PDFs) and is then fingerprinted with a hashing algorithm to produce an index that can be deployed as part of a DLP policy.
While effective in protecting much of an organization’s sensitive information, Describing and Fingerprinting technologies have limitations when addressing growing amounts of unstructured data and IP such as product formulas, sales and marketing reports, and source code. For example, Fingerprinting can be challenging for organizations with widely dispersed data, and policies to describe data can be both time consuming to create and less accurate than fingerprinting.
Vector Machine Learning is a new DLP detection technology that overcomes the limitations of Describing and Fingerprinting. It is “trained” using sample documents to recognize the defining features of sensitive data and to identify the subtle differences between sensitive and non-sensitive data. This eliminates the need to create keyword-based policies or try to fingerprint new documents as they are created.
Vector Machine Learning is best used to protect unstructured data like proprietary source code for products, trading models for financial service firms, or actuary algorithms for insurance companies. In the case of source code, it would be able to discern whether it was proprietary (and in need of protection) or open source (and able to be freely distributed by a software engineer).
Vector Machine Learning complements existing Describing and Fingerprinting technologies and improves the ability of organizations to protect IP, particularly unstructured data that resides in highly distributed environments.
Companies often ask Symantec where and how to start protecting their IP. With DLP, it is possible to measurably reduce data loss risk. Symantec recommends using a multi-phase methodology to find and fix exposed confidential data:
- First, talk to business executives to decide what data is most important to protect.
- Next, define content-aware data loss policies to find and fix exposed data.
- Use three types of detection technologies to find data: Describing, Fingerprinting, and the newest, Learning.
The next release of Symantec Data Loss Prevention introduces the first and only machine learning technology to be incorporated into a DLP product. It will reduce the time and expertise needed to develop data security policies for IP. This next release will also incorporate new Symantec Data Insight 2.0 technology to make it easier and faster to find and fix exposed data. This is made possible through the addition of two new capabilities:
- Risk Scoring combines content usage and access information to gives business users what they need to know to address risk “hot spots.”
- Data Owner Remediation directly notifies business data owners about exposed files and gives them options to secure those files before they’re leaked or stolen.
There are several compelling reasons:
- Data loss is everywhere. According to one recent study, 88% of companies have experienced data loss in the last 12 months. 1
- Breaches are expensive. The average cost of a data breach has been estimated to be $6.75 million. 2
- More data, more risk. Unstructured data (email, word processing documents, spreadsheets, etc.) is growing at a rate of more than 60% per year. 3
- Confidential data is going to competitors. It’s been estimated that 67% of ex-employees have taken confidential data to leverage a new job. 4
- Data loss is coming from within organizations. An estimated 59% of ex-employees leave with company data. 5
Information is at greater risk today than ever before, as demonstrated by recent high-profile breaches that involve the theft of proprietary data from large organizations. That’s why more accurate detection of intellectual property and the ability to stop sensitive data in transit need to be fundamental elements of an effective information protection program. Symantec Data Loss Prevention provides a content-aware solution to discover, monitor, protect, and manage confidential data wherever it is stored or used.
- 1 Ponemon Institute, “2010 Annual Study: U.S. Enterprise Encryption Trends.”
- 2 Ponemon Institute, “2009 Annual Study: Cost of a Data Breach.”
- 3 International Data Corp., April 2008.
- 4 Ponemon Institute, “Data Loss Risks During Downsizing: As Employees Exit, So Does Corporate Data,” February 2009.
- 5 Ibid.