Data Sharing

Worldwide Intelligence Network Environment

Symantec’s Worldwide Intelligence Network Environment (WINE) is a platform for repeatable experimental research, which provides NSF-supported researchers access to sampled security-related data feeds that are used internally at Symantec Research Labs. Often, today’s datasets are insufficient for computer security research. WINE was created to fill this gap by enabling external research on field data collected at Symantec and by promoting rigorous experimental methods. WINE allows researchers to define reference data sets, for validating new algorithms or for conducting empirical studies, and to establish whether the data set is representative for the current landscape of cyber threats. The platform also enables the reproduction of experimental results and allows comparing the performance of different algorithms against the reference data sets. Moreover, the field data included in WINE has not been analyzed in depth, beyond its operational use, and will likely provide key insights for the fields of security, dependability, machine learning and software engineering.

Availability of Field Data

Symantec has established some of the most comprehensive sources of Internet threat data in the world through the Symantec Global Intelligence Network. More than 240,000 sensors in over 200 countries monitor attack activity through a combination of Symantec products and services such as Symantec DeepSight Threat Management System, Symantec Managed Security Services and Norton consumer products, as well as additional third-party data sources. Symantec also gathers malicious code intelligence from more than 130 million client, server, and gateway systems that have deployed its antivirus products. Additionally, Symantec’s distributed honeypot network collects data from around the globe, capturing previously unseen threats and attacks and providing valuable insight into attacker methods. Spam and phishing data is captured through a variety of sources including the Symantec Probe Network, a system of more than 2.5 million decoy accounts; MessageLabs Intelligence, a respected source of data and analysis for messaging security issues, trends and statistics; and other Symantec technologies. Data is collected in more than 86 countries. Over 8 billion email messages and over 1 billion Web requests are processed per day across 16 major data centers. These resources give Symantec’s analysts unparalleled sources of data with which to identify, analyze, and provide informed commentary on emerging trends in attacks, malicious code activity, phishing, and spam.

Description of Sample Data Set

Symantec collects telemetry data from over 75 million machines. This data set records occurrences of all the known host- and network-based attacks, allowing researchers to map the spread of cyber threats around the world. For example, each record includes the signature of the attack, the OS version of the attack’s target, the name of the compromised process and the URL or file the attack came from.

Operational Model

To protect the sensitive information included in the datasets (e.g., URLs that have been compromised), Symantec requires researchers to visit one of our Culver City, CA or Herndon, VA locations for accessing the WINE system. Researchers will have access to the raw data collected. Symantec Research Labs will accept proposals that briefly explain the research question investigated and that request access to the data sets. A snapshot of the data requested will be frozen, for future reference, and all the analysis and experimentation will be conducted on the infrastructure provided by Symantec Research Labs.

How to participate

  1. Email requesting a non-disclosure agreement for the purpose of viewing confidential descriptions of data available—sign and return this document.
  2. Send a project proposal (1-page max) including the problem studied, proposed research approach, data needed and estimate of visit duration. In return, the researcher will receive a contract specifying general-use guidelines, IP policies, publication process and data privacy rules.
  3. If approved, Symantec Research will provide a letter of collaboration and fee schedule that can be attached to an NSF funding proposal.
  4. Symantec Research will prepare the requested data sample, and the researcher will be able to schedule visits to the Culver City, CA or Herndon, VA locations, where the WINE data sets will be available.