Publications

Peer-Reviewed Publications from Symantec Research Labs

Academic Papers

2019

  • ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks
    Yun Shen, Gianluca Stringhini
    To appear at the 28th USENIX Security Symposium (USENIX 2019)

    We present ATTACK2VEC, a system that uses temporal word embeddings to model how attack steps are exploited in the wild, and track how they evolve.

  • Can I Opt Out Yet? GDPR and the Global Illusion of Cookie Control
    Iskander Sanchez-Rola, Matteo Dell’Amico, Platon Kotzias, Davide Balzarotti, Leyla Bilge, Pierre-Antoine Vervier, Igor Santos.
    To appear at the 14th ACM Asia Conference on Computer and Communications Security (ACM ASIACCS 2019)

    We evaluate both the information presented to users and the actual tracking implemented through cookies; we find that the GDPR has impacted website behavior in a truly global way, both directly and indirectly. On the other hand, we find that tracking remains ubiquitous.

  • Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization
    Yufei Han, Yuzhe Ma, Christopher Gates, Kevin A. Roundy and Yun Shen
    To appear at the 2019 International Joint Conference on Neural Networks (IJCNN)

    In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. The focus is to find strategies to organize distributed agents to jointly select a compact subset of data that can be used to train a global model. The global model should achieve nearly the same performance as if the central learner had access to all the data, but the central learner only has access to the selected subset, and each agent only has access to their own data. The goal of this research is to find good strategies to train global models while giving some control back to agents.

  • Entrust: Regulating Sensor Access by Cooperating Programs via Delegation Graph
    Giuseppe Petracca, Yuqiong Sun, Ahmad-Atamli Reineh, Jens Grossklags, Patrick McDaniel and Trent Jaeger
    To appear in proceedings of the 28th USENIX Security Symposium (USENIX 2019)

  • A Field Study of Computer-Security Perceptions Using Anti-Virus Customer-Support Chats
    Mahmood Sharif, Kevin A. Roundy, Matteo Dell'Amico, Christopher Gates, Daniel Kats, Lujo Bauer, Nicolas Christin
    In Proceedings of the 2019 Conference on Human Factors in Computing Systems (CHI 2019)

    To identify needs for improvement in security products, we study security concerns raised in Norton Security customer support chats. We found that many consumers face technical support scams and are susceptible to them. Findings also show the value of customer support centers in that 96% of customers that reach out for support in relation to scams have not paid the scammers.

  • IoT Security and Privacy Labels
    Yun Shen, Pierre-Antoine Vervier
    In Proceedings of the ENISA Annual Privacy Forum (APF 2019)

    We devise an concise, informative IoT labelling scheme to convey high-level security and privacy facts about an IoT device to the consumers so as to raise their security and privacy awareness.

  • Looking from the Mirror: Evaluating IoT Device Security through Mobile Companion Apps
    Xueqiang Wang, Yuqiong Sun, Susanta Nanda and XiaoFeng Wang
    To appear in the proceedings of the 28th USENIX Security Symposium (USENIX 2019)

  • Making Machine Learning Forget
    Saurabh Shintre, Kevin A. Roundy, and Jasjeet Dhaliwal
    In Proceedings of the 2019 ENISA Annual Privacy Forum (APF 2019)

    We specifically analyze how the “right-to-be-forgotten” provided by the European Union General Data Protection Regulation can be implemented on current machine learning models and which techniques can be used to build future models that can forget. This document also serves as a call-to-action for researchers and policy-makers to identify other technologies that can be used for this purpose.

  • Utility-Driven Graph Summarization
    K. Ashwin Kumar, Petros Efstathopoulos
    To appear at the 45th International Conference on Very Large Database (VLDB 2019)

    In this work, we present a novel approach to summarize a complex graph driven by the objective of maximizing the utility of the calculated graph summary. Subsequently, we propose a utility-driven summarization algorithm, that allows a user to query a graph summary with a specified utility value.

  • Waves of Malice: A Longitudinal Measurement of the Malicious File Delivery Ecosystem on the Web
    Colin C. Ife, Yun Shen, Steven J. Murdoch, Gianluca Stringhini
    To appear at the 14th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2019)

    We present a longitudinal measurement of malicious file distribution on the Web.

2018

2017

  • Smoke Detector: Cross-Product Intrusion Detection With Weak Indicators
    Kevin A. Roundy, Acar Tamersoy, Michael Spertus, Michael Hart, Daniel Kats, Matteo Dell'Amico, Robert Scott
    In Proceedings of the Annual Computer Security Applications Conference (ACSAC 2017)

    Smoke Detector significantly expands upon limited collections of hand-labeled security incidents by framing event data as relationships between events and machines, and performing random walks to rank candidate security incidents. Smoke Detector significantly increases incident detection coverage for mature Managed Security Service Providers.

  • Large-Scale Identification of Malicious Singleton Files
    Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik
    In Proceedings of the 7th ACM Conference on Data and Application Security and Privacy (CODASPY)

    94% of the software files that Symantec saw in a 1-year dataset appeared only once on a single machine. We examine the primary reasons for which both benign and malicious software files appear as singletons, and design a classifier to distinguish between these two classes of singleton software files.

  • RiskTeller: Predicting the Risk of Cyber Incidents
    Leyla Bilge, Yufei Han, Matteo Dell'Amico
    In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

    We present a system, RiskTeller, that can predict to-be-infected machines in an enterprise environment.

  • Lean On Me: Mining Internet Service Dependencies From Large-Scale DNS Data
    Matteo Dell'Amico, Leyla Bilge, Ashwin Kayyoor, Petros Efstathopoulos, Pierre-Antoine Vervier
    In Proceedings of the 33th Annual computer Security Applications Conference (ACSAC 2017)

    To assess the security risk for a given entity, and motivated by the effects of recent service disruptions, we perform a large-scale analysis of passive and active DNS datasets including more than 2.5 trillion queries in order to discover the dependencies between websites and Internet services.

  • Mini-Batch Spectral Clustering
    Yufei Han and Maurizio Filippone
    In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2017)

    This paper proposes a practical approach to learn spectral clustering based on adaptive stochas-tic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering fora given computational budget.

  • Predicting Cyber Threats with Virtual Security Products
    Shang-Tse Chen, Yufei Han, Duen Horng Chau, Christopher Gates, Michael Hart, Kevin A. Roundy
    In Proceedings of the 33th Annual computer Security Applications Conference (ACSAC 2017)

    We set out to predict which security events and incidents a security product would have detected had it been deployed, based on the events produced by other security products that were in place. We discovered that the problem is tractable, and that some security products are much harder to model than others, which makes them more valuable.

  • Marmite: Spreading Malicious File Reputation Through Download Graphs
    Gianluca Stringhini, Yun Shen, Yufei Han, Xiangliang Zhang
    Annual Computer Security Applications Conference (ACSAC 2017)

    We presented Marmite, a system that is able to detect malicious files by leveraging a global download graph and label propagation with Bayesian confidence.

  • Aware: Preventing Abuse of Privacy-Sensitive Sensors via Operation Bindings
    Giuseppe Petracca, Ahmad-Atamli Reineh, Yuqiong Sun, Jens Grossklags and Trent Jaeger
    In Proceedings of the 26th USENIX Security Symposium (Aug 2017)

  • Automatic Application Identification from Billions of Files
    Kyle Soska, Chris GatesKevin Roundy, and Nicolas Christin
    In Proceedings of the 23rd SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2017)

    Mapping binary files into software packages enables malware detection and other tasks, but is challenging. By combining installation data with file metadata that we summarize into sketches, from millions of machines and billions of files, we can use efficient approximate clustering techniques to map files to applications automatically and reliably.

  • Scalable and flexible clustering solutions for mobile phone-based population indicators
    Alessandro Lulli, Lorenzo Gabrielli, Patrizio Dazzi, Matteo Dell'Amico, Pietro Michiardi, Mirco Nanni, Laura Ricci
    International Journal of Data Science and Analytics 4.4 (2017): 285-299

    We use distributed and scalable clustering techniques to perform estimation of population estimation, including mobility, based on mobile phone calls data.

2016

  • Generating Graph Snapshots from Streaming Edge Data
    Sucheta Soundarajan, Acar Tamersoy, Elias B. Khalil, Tina Eliassi-Rad, Duen Horng Chau, Brian Gallagher, Kevin Roundy
    In Proceedings of the 25th International World Wide Web Conference (WWW), 2016

    We study the problem of determining the proper aggregation granularity for a stream of time-stamped edges. To this end, we propose ADAGE and demonstrate its value in automatically finding the appropriate aggregation intervals on edge streams for belief propagation to detect malicious files and machines.

  • Efficient Routing for Cost Effective Scale-out Data Architectures
    Ashwin Narayan, Vuk Markovic, Natalia Postawa, Anna King, Alejandro Morales, K. Ashwin Kumar, Petros Efstathopoulos
    In Proceedings of the IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'16)

    In the context of large-scale data architectures, we propose an efficient technique to speedup the routing of a large number of real-time queries while minimizing the number of machines that each query touches (query span)

  • Measuring PUP Prevalence and PUP Distribution through Pay-Per-Install Services
    P Kotzias, L Bilge, J Caballero
    Usenix Security Symposium

    We perform the first systematic study of PUP prevalence and its distribution through pay-perinstall (PPI) services, which link advertisers that want to promote their programs with affiliate publishers willing to bundle their programs with offers for other software.

  • Improving population estimation from mobile calls: a clustering approach
    A Lulli, L Gabrielli, P Dazzi, M Dell'Amico, P Michiardi, M Nanni, L Ricci
    2015 IEEE Symposium on Computers and Communications

    We use distributed and scalable clustering techniques to perform estimation of population estimation, including mobility, based on mobile phone calls data.

  • NG-DBSCAN: Scalable Density-Based Clustering for Arbitrary Data
    Alessandro Lulli, Matteo Dell'Amico, Pietro Michiardi, Laura Ricci
    In Proceedings of the VLDB Endowment, Vol. 10, No. 3, 2016

    A scalable and distributed implementation of the DBSCAN clustering algorithm. The particularity of NG-DBSCAN is that it works scalably based on arbitrary data and distance functions.

  • PSBS: Practical Size-Based Scheduling
    M Dell'Amico, D Carra, P Michiardi
    IEEE Transactions on Computers, 2016

    Size-based scheduling algorithms can perform disastrously with skewed workloads and incorrect size information. PSBS is a scheduling discipline that performs very well even when job sizes are incorrect.

  • Accurate spear phishing campaign attribution and early detection
    Y Han, Y Shen
    ACM Sig SAC 2016

    In this paper, we introduce four categories of email profiling features that capture var-ious characteristics of spear phishing emails. Building on these features, we implement and evaluate an affinity graphbased semi-supervised learning model for campaign attribution and detection. 

  • Content-Agnostic Malware Detection in Heterogeneous Malicious Distribution Graph
    I M Alabdulmohsin, Y Han, Y Shen, X Zhang
    CIKM 2016

    We propose a novel Bayesian label propagation model to unify the multi-source information,including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.

  • Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
    by Yufei Han and Yun Shen
    IJCAI 2016

    We propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training in-stances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model.

  • Insights into rooted and non-rooted Android mobile devices with behavior analytics
    Y Shen, N Evans, A Benameur
    ACM SAC 2016

    We proposed the first quantitative analysis of mobile devices from the perspective of comparing rooted devices to non-rooted devices. We have attempted to map high level thoughts about the characteristics of users who root their devices to the low-level data at our disposal.

2015

2014

  • Guilt by Association: Large Scale Malware Detection by Mining File-relation Graphs
    (AESOP), A Tamersoy, K Roundy, DH Chau
    In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014

    We present AESOP, a scalable algorithm that identifies malicious executable files by leveraging a novel combination of locality-sensitive hashing and belief propagation. AESOP attained early labeling of 99% of benign files and 79% of malicious files with a 0.9961 true positive rate at 0.0001 false positive rate.

  • Some Vulnerabilities Are Different Than Others: Studying Vulnerabilities and Attack Surfaces in the Wild
    Kartik Nayak, Daniel Marino, Petros Efstathopoulos, Tudor Dumitras
    In Proceedings of the 17th International Symposium on Research in Attacks, Intrusions and Defenses (RAID'14)

    This empirical study of intrusion-prevention field data collected from millions of hosts illuminates differences in how often different software vulnerabilities are exploited in the wild.  We study several factors that may influence whether vulnerable software will be attacked and introduce new field-data-based security metrics that help quantify the real-world impact of a vulnerability.

  • Ethics in Data Sharing: Developing a Model for Best Practice
    Sven Dietrich, Jeroen van der Ham, Aiko Pras, Roland van Rijswijk-Deij, Darren Shou, Anna Sperotto, Aimee van Wynsberghe, Lenore D. Zuck
    IEEE Symposium on Security and Privacy Workshops

  • Quality Estimation of English-French Machine Translation: A Detailed Study of the Role of Syntax
    Rasoul Kaljahi, Jennifer Foster, Johann Roturier, Raphael Rubino
    25th International Conference on Computational Linguistics, Dublin, Ireland, 2014.

  • Syntax and Semantics in Quality Estimation of Machine Translation
    Rasoul Kaljahi (Dublin City University / Symantec Research Labs), Jennifer Foster (Dublin City University), Johann Roturier (Symantec Research Labs)
    Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014, Doha, Qatar.

  • EXPOSURE: a Passive DNS Analysis Service to Detect and Report Malicious Domains
    L Bilge, S Sen, D Balzarotti, E Kirda, C Kruegel
    ACM Transactions on Information and System Security (TISSEC)

    We present an extended version Exposure and the experimental results on 17 month of its deployment on real data.

  • On the Effectiveness of Risk Prediction Based on Users Browsing Behavior
    D Canali, L Bilge, D Balzarotti
    Proceedings of the 9th ACM symposium on Information, computer and communications security

    We present a comprehensive study on the effectiveness of risk prediction based only on the web browsing behavior of users.

  • Malicious BGP Hijacks: Appearances Can Be Deceiving
    Pierre-Antoine Vervier, Quentin Jacquemart, Johann Schlamp, Olivier Thonnard, Georg Carle, Guillaume Urvoy-Keller, Ernst Biersack, Marc Dacier
    IEEE International Conference on Communications: Communications and Information Systems Security Symposium (ICC 2014)

    This paper discusses the challenges of Internet routing anomalies and BGP hijacks investigations. With the help of a real-world potential BGP hijack case study, we describe our investigation process and highlight the challenges and limitations faced.

  • Study of collective user behaviour in Twitter: a fuzzy approach
    Xin Fu, Yun Shen
    Journal of Neural Computing & Applications 2014

    We proposed a new approach which applies the mass assignment-based fuzzy association rules mining (MASS-FARM) algorithm to Twitter data analysis, for the first time, to automatically extract useful and meaningful knowledge from large-scale data set.

  • MR-TRIAGE: Scalable multi-criteria clustering for big data security intelligence applications.
    Yun Shen, Olivier Thonnard
    IEEE BigData Conference 2014

    We introduce a new framework called MR-TRIAGE leveraging multi-criteria data clustering (MCDC) to perform scalable data clustering on large security data sets and further implement a set of efficient algorithms in a 3-stage MapReduce paradigm.

2013

  • A Safety-First Approach to Memory Models
    Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, Madanlal Musuvathi
    IEEE Micro Top Picks, Volume 33, Number 3, May/June 2013

    The concurrency semantics of mainstream programming languages provide "safety" only under the assumption that programmers have implemented proper synchronization to prevent data races.  But since simple programming mistakes can break this assumption and result in unreliable program behavior, we argue instead for providing a safety-first model that assumes an access may participate in a data race unless proven otherwise.

  • Detecting Deadlock in Programs with Data-Centric Synchronization
    Daniel Marino, Christian Hammer, Julian Dolby, Mandana Vaziri, Frank Tip, Jan Vitek
    In Proceedings of the 35th International Conference on Software Engineering (ICSE'13)

    We present an analysis for establishing deadlock-freedom for programs written in AJ, a Java extension in which programmers declaratively specify synchronization constraints on data members, relieving them from writing error-prone synchronization code.

  • Community-based post-editing of machine-translated content: monolingual vs. bilingual
    Linda Mitchell, Johann Roturier and Sharon O’Brien
    MT Summit XIV Workshop on Post-editing Technology and Practice, Nice, France, 2013.

  • DCU-Symantec at the WMT 2013 Quality Estimation Shared Task
    Raphaël Rubino, Johann Roturier, Rasoul Samad Zadeh Kaljahi, Fred Hollowood (Symantec Research Labs), Jennifer Foster and Joachim Wagner.
    Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria, 2013.

  • Quality Estimation-guided Data Selection for Domain Adaptation of SMT
    Pratyush Banerjee, Raphael Rubino, Johann Roturier and Josef van Genabith
    MT Summit XIV, Nice, France, 2013.

  • The ACCEPT Post-Editing environment: a flexible and customisable online tool to perform and analyse machine translation post-editing
    Johann Roturier, Linda Mitchell and David Silva.
    MT Summit XIV Workshop on Post-editing Technology and Practice, Nice, France, 2013.

  • Cloud Resiliency and Security via Diversified Replica Execution and Monitoring
    Azzedine Benameur, Nathan Evans, Matthew Elder (Symantec Research Labs)
    In Proceedings of the 1st International Symposium on Resilient Cyber Systems

  • MINESTRONE: Testing the SOUP
    Azzedine Benameur, Nathan Evans, Matthew Elder (Symantec Research Labs)
    USENIX Workshop on Cyber Security Experimentation and Test (CSET ‘13)

  • Server-side code injection attacks: a historical perspective
    Jakob Fritz (SRL), Corrado Leita (SRL), Michalis Polychronakis
    Research in Attacks, Intrusions and Defenses Symposium (RAID)

  • Spatio-Temporal Mining of Software Adoption & Penetration
    Evangelos Papalexakis (CMU), Tudor Dumitraș (Symantec Research Labs), Polo Chau (Georgia Tech), B. Aditya Prakash (Virginia Tech), and Christos Faloutsos (CMU)
    IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM 2013)

  • SpamTracer: How stealthy are spammers?
    Pierre-Antoine Vervier and Olivier Thonnard (Symantec Research Labs)
    5th International Traffic Monitoring and Analysis Workshop (TMA 2013)

    In this paper we present SpamTracer, a system designed to collect and analyze the routing behavior of spam networks in order to determine whether they use BGP hijacks to stealthily send spam from stolen networks.

  • MutantX-S: Scalable Malware Clustering Based on Static Features
    X Hu, S Bhatkar, K Griffin, KG Shin
    USENIX ATC 2013

    In this paper, we present an efficient malware clustering technique that uses instruction-based features to provide high accuracy.

2012

  • A Data-Centric Approach to Synchronization
    Julian Dolby, Christian Hammer, Daniel Marino, Frank Tip, Mandana Vaziri, Jan Vitek
    ACM Transactions on Programming Languages (TOPLAS), Volume 34, Issue 1, April 2012

    Concurrency-related errors, such as data races, are frustratingly difficult to track down and eliminate in large, object-oriented programs.  We describe AJ, and extension to Java, which uses a declarative, data-centric synchronization paradigm that eliminates a large class of concurrency bugs with low programmer effort.

  • End-to-End Sequential Consistency
    Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd Millstein, Madanlal Musuvathi
    In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA'12)

    By allowing compiler and hardware to cooperate, we show how strong, safe memory models for concurrent programs can be provided with minimal impact on performance.

  • A Detailed Analysis of Phrase-based and Syntax-based Machine Translation: The Search for Systematic Differences
    Rasoul Samad Zadeh Kaljahi, Raphael Rubino, Johann Roturier (Symantec Research Labs) and Jennifer Foster.
    Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, 2012.

  • DCU-Symantec Submission for the WMT 2012 Quality Estimation Task
    Raphaël Rubino, Johann Roturier, Rasoul Samad Zadeh Kaljahi, Fred Hollowood (Symantec Research Labs), Jennifer Foster and Joachim Wagner
    Seventh Workshop on Statistical Machine Translation, Montréal, Canada, 2012.

  • Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word Reduction: Normalization and/or Supplementary Data?
    Pratyush Banerjee, Sudip Kumar Naskar, Andy Way, Josef van Genabith, and Johann Roturier (Symantec Research Labs).
    EAMT 2012, Trento, Italy.

  • Evaluation of Machine-Translated User Generated Content: A pilot study based on User Ratings
    Linda Mitchell and Johann Roturier (Symantec Research Labs)
    EAMT 2012, Trento, Italy.

  • Translation Quality-Based Supplementary Data Selection by Incremental Update of Translation Models
    Pratyush Banerjee, Sudip Kumar Naskar, Andy Way, Josef van Genabith, and Johann Roturier (Symantec Research Labs)
    24th International Conference on Computational Linguistics, Mumbai, India, 2012.

  • Using Automatic Machine Translation Metrics to Analyze the Impact of Source Reformulations
    Johann Roturier, Linda Mitchell (Symantec Research Labs), Robert Grabowski, Melanie Siegel.
    Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, 2012.

  • Before We Knew It: An Empirical Study of Zero-Day Attacks In The Real World
    L Bilge, T Dumitras
    In Proceedings of the 2012 ACM conference on Computer and communications security

    We describe a method for automatically identifying zero-day attacks from field-gathered data that records when benign and malicious binaries are downloaded on 11 million real hosts around the world.

  • DISCLOSURE: Detecting Botnet Command and Control Servers Through Large-Scale NetFlow Analysis
    L Bilge, D Balzarotti, W Robertson, E Kirda, C Kruegel
    In Proceedings of the 28th Annual Computer Security Applications Conference

    We present Disclosure, a large-scale, wide-area botnet detection system that incorporates a combination of novel techniques analysing netflow data.

  • Industrial Espionage and Targeted Attacks: Understanding the Characteristics of an Escalating Threat
    O Thonnard, L Bilge, G O’Gorman, S Kiernan, M Lee
    International Workshop on Recent Advances in Intrusion Detection

    We provide an in-depth analysis of a large corpus of targeted attacks identified by Symantec during the year 2011.

  • Declarative Privacy Policy: Finite Models and Attribute-Based Encryption
    Sharada Sundaram, Peifung E. Lam, John C. Mitchell, Andre Scedrov
    In Proceedings of the 2nd ACM International Health Informatics Symposium

  • Security of Power Grids: a European Perspective
    Corrado Leita, Marc Dacier
    NIST Cyber-Physical Systems Workshop

  • The MEERKATS Cloud Security Architecture
    Angelos Keromytis, Roxana Geambasu, Simha Sethumadhavan, Salvatore Stolfo, Junfeng Yang (Columbia University), Azzedine Benameur, Marc Dacier, Matthew Elder, Darrell Kienzle (Symantec Research Labs), Angelos Stavrou (George Mason University)
    In Proceedings of the 32nd International Conference on Distributed Computing Systems Workshops (ICDCSW)

  • Visual Spam Campaigns Analysis Using Abstract Graphs Representation
    Orestis Tsigkas (CERTH-ITI), Olivier Thonnard (Symantec Research Labs), Dimitrios Tzovaras (CERTH-ITI)
    In Proceedings of the Ninth International Symposium on Visualization for Cyber Security (VizSec)

  • File Routing Middleware for Cloud Deduplication
    Petros Efstathopoulos
    In Proceedings of the 2nd International Workshop on Cloud Computing Platforms - CloudCP 2012, Bern, Switzerland, April 2012.

    We propose the idea of performing local deduplication operations within each cloud node, and introduce file similarity metrics to determine which node is the best deduplication host for a particular incoming file. This approach reduces the problem of scalable cloud deduplication to a file routing problem, which we can address using a software layer capable of making the necessary routing decisions.

  • Ask WINE: Are We Safer Today? Evaluating Operating System Security through Big Data Analysis
    Tudor Dumitras, Petros Efstathopoulos (Symantec Research Labs)
    In Proceedings of the 5thUSENIX Workshop on Large-Scale Exploits and Emerging Threats (LEET'12), San Jose, CA, USA, April 2012.

    In this position paper, we argue that in order to answer conclusively whether end-users are safer today, we must analyze field data collected on real hosts that are targeted by attacks—e.g., the approximately 50 million records of anti-virus telemetry available through Symantec’s WINE platform.

  • The Provenance of WINE
    Tudor Dumitras, Petros Efstathopoulos (Symantec Research Labs)
    In Proceedings of the 9th European Dependable Computing Conference - EDCC 2012, Sibiu, Romania, May 2012.

    In the WINE benchmark, which provides field data for cyber security experiments, we aim to make the experimental process self-documenting. The data collected includes provenance information—such as when, where and how an attack was first observed or detected—and allows researchers to gauge information quality.

  • VisTracer: A Visual Analytics Tool to Investigate Routing Anomalies in Traceroutes
    Fabian Fischer (Univ. of Konstanz), Johannes Fuchs (Univ. of Konstanz), Pierre-Antoine Vervier (Symantec Research Labs), Florian Mansmann (Univ. of Konstanz), Olivier Thonnard (Symantec Research Labs)
    9th Symposium on Visualisation for Cyber Security (VizSec 2012)

    This paper proposes VisTracer, a visual analytics tool specifically tailored for the analysis of traceroute measurements for the purpose of uncovering routing anomalies potentially resulting from BGP hijacks.

  • Visual Analytics for BGP Monitoring and Prefix Hijacking Identification
    Ernst Biersack, Quentin Jacquemart, Fabian Fischer, Johannes Fuchs, Olivier Thonnard, Georgios Theodoridis, Dimitrios Tzovaras,  and Pierre-Antoine Vervier.
    IEEE Network

    In this article, we give a short survey of visualization methods that have been developed for BGP monitoring, in particular for the identification of prefix hijacks. Our goal is to illustrate how network visualization has the potential to assist an analyst in detecting abnormal routing patterns in massive amounts of BGP data.

  • Spammers operations: a multifaceted strategic analysis
    O Thonnard, P-A Vervier, M Dacier
    Security and Communication Networks (Wiley)

    This paper explores several facets of spammers operations by studying their strategic behavior on a long‐term basis.

2011

  • Ethical Considerations of Sharing Data for Cybersecurity Research
    Darren Shou
    Financial Cryptography Workshops

  • Toward a Standard Benchmark for Computer Security Research: The Worldwide Intelligence Network Environment (WINE)
    Tudor Dumitras and Darren Shou (Symantec Research Labs)
    First EuroSys Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (EuroSys BADGERS)

  • Domain adaptation in statistical machine translation of user-forum data using component-level mixture modeling in statistical machine translation of user-forum data using component-level mixture modeling
    P. Banerjee, S. Kumar Naskar, J. Roturier, A. Way, & J. van Genabith
    MT Summit XIII, Xiamen, China, 2011.

  • Evaluation of MT systems to translate user generated content
    Johann Roturier and Anthony Bensadoun
    MT Summit XIII, Xiamen, China, 2011.

  • Qualitative analysis of post-editing for high quality machine translation
    F. Blain, J. Senellart, H. Schwenk, M. Plitt, & J. Roturier
    MT Summit XIII, Xiamen, China, 2011.

  • A Strategic Analysis of Spam Botnets Operations
    Olivier Thonnard and Marc Dacier (Symantec Research Labs)
    In Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS)

  • Experimental Challenges in Cyber Security: A Story of Provenance and Lineage for Malware
    Tudor Dumitras (Symantec Research Labs), Iulian Neamtiu (CMU)
    USENIX Workshop on Cyber Security Experimentation and Test (CSET)

  • HARMUR: Storing and Analyzing Historic Data on Malicious Domains
    Corrado Leita (Symantec Research Labs) and Marco Cova (University of Birmingham)
    First EuroSys Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (EuroSys BADGERS)

  • PhorceField: A Phish-Proof Password Ceremony
    Michael Hart, Claude Castille, Manoj Harpalani, Jonathan Toohill, and Rob Johnson (Stony Brook University)
    In Proceedings of the 27th Annual Computer Security Applications Conference (ACSAC)

  • The MINESTRONE Architecture Combining Static and Dynamic Analysis Techniques for Software Security
    Angelos Keromytis, Salvatore Stolfo, Junfeng Yang (Columbia University), Angelos Stavrou, Anup Ghosh (George Mason University), Dawson Engler (Stanford University), Marc Dacier, Matthew Elder, Darrell Kienzle (Symantec Research Labs)
    In Proceedings of the First SysSec Workshop (SysSec)

  • Towards SIRF: Self-contained Information Retention Format
    Simona Rabinovici-Cohen (IBM Haifa Labs), Mary G. Baker (HP Labs), Roger Cummings (Symantec Research Labs), Samuel A. Fineberg (HP Software), and John Marberg (IBM Haifa labs)
    In Proceedings of the 4th Annual International Systems and Storage Conference (SYSTOR)

  • Building a High-performance Deduplication System
    F Guo, P Efstathopoulos
    In Proceedings of the 2011 USENIX Annual Technical Conference, Portland, OR, June 2011. Best Paper Award.

    In this paper we present our high-performance deduplication prototype, designed from the ground up to optimize overall single-node performance, by making the best possible use of a node’s resources, and achieve three important goals: scale to large capacity, provide good deduplication efficiency, and near-raw-disk throughput.

2010

  • Improving the Post-Editing Experience Using Translation Recommendation: A User Study
    Y. He, Y. Ma, J. Roturier, A. Way, and J. van Genabith
    Ninth Conference of the Association for Machine Translation in the Americas, Denver, Colorado, 2010.

  • Source Text Characteristics and Technical and Temporal Post-Editing Effort: What is Their Relationship?
    M. Tatsumi and J. Roturier
    Second joint EM+/CNGL Workshop, "Bringing MT to the user: Research on integrating MT in the translation industry", AMTA, Denver, Colorado, 2010.

  • TMX Markup: A Challenge When Adapting SMT to the Localisation Environment
    J. Du, J. Roturier, and A. Way
    14th Annual conference of the European Association for Machine Translation, Saint-Raphaël, France, 2010.

  • An Analysis of Rogue AV Campaigns
    Marco Cova (University of California, Santa Barbara), Corrado Leita (Symantec Research Labs), Olivier Thonnard (Royal Military Academy, Belgium), Angelos Keromytis (Columbia University), and Marc Dacier (Symantec Research Labs)
    Symposium on Recent Advances in Intrusion Detection (RAID 2010)

  • An Attack Surface Metric
    Pratyusa K. Manadhata (Symantec Research Labs) and Jeannette M. Wing (Carnegie Mellon University)
    IEEE Transactions on Software Engineering

  • Exploiting diverse observation perspectives to get insights on the malware landscape
    Corrado Leita (Symantec Research Labs), Ulrich Bayer (Technical University Vienna), and Engin Kirda (Institute Eurecom)
    In Proceedings of the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

  • Measurement and Gender-Specific Analysis of User Publishing Characteristics on MySpace
    William Gauvin (Symantec Research Laboratories and University of Massachusetts Lowell), Bruno Ribeiro, Benyuan Liu, Don Towsley, and Jie Wang (University of Massachusetts Lowell)
    IEEE Network

  • On a multicriteria clustering approach for attack attribution
    Olivier Thonnard (Royal Military Academy), Wim Mees (Royal Military Academy), Marc Dacier (Symantec Research Labs)
    ACM SIGKDD Explorations Newsletter

  • Responsibility for the Harm and Risk of Software Security Flaws
    Cassio Goldschmidt (Symantec Research Labs), Melissa Jane Dark, and Hina Chaudhry (Purdue University)
    Information Assurance and Security Ethics in Complex Systems: Interdisciplinary Perspectives, by Melissa Jane Dark (Purdue University)

  • Rethinking Deduplication Scalability
    Petros Efstathopoulos and Fanglu Guo (Symantec Research Labs)
    In Proceedings of the 2nd USENIX Workshop on Hot Topics in Storage and File Systems, Boston, MA, June 2010.

    We advocate a shift towards scalability-centric design principles for deduplication systems, and present some of the mechanisms used in our prototype, aiming at high scalability, good deduplication efficiency, and high throughput.

2009

  • Deploying Novel MT Technology to Raise the Bar for Quality: Key Advantages and Challenges
    J. Roturier
    MT Summit XII, Ottawa, Ontario, Canada, 2009.

  • How to Treat GUI Options in IT Technical Texts for Authoring and Machine Translation
    J. Roturier and S. Lehmann
    Journal of Internationalisation and Localisation, Volume 1, 2009.

  • A Simple, Fast, and Compact Static Dictionary
    Scott Schneider and Michael Spertus (Symantec Research Labs)
    In Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC)

  • Addressing the Attack Attribution Problem using Knowledge Discovery and Multi-criteria Fuzzy Decision-Making
    Olivier Thonnard, Wim Mees (Royal Military Academy, Belgium); and Marc Dacier (Symantec Research Labs)
    In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Workshop on CyberSecurity and Intelligence Informatics, Conference Best Paper Award

  • Advances in Topological Vulnerability Analysis
    Steven Noel (George Mason University); Matthew Elder (Symantec Research Labs); Sushil Jajodia, Pramod Kalapa (George Mason University); Scott O’Hare, and Kenneth Prole (Secure Decisions, Division of Applied Visions Inc.)
    In Proceedings of the Cybersecurity Applications & Technology Conference For Homeland Security (CATCH 2009)

  • An Experimental Study of Diversity with Off-The-Shelf AntiVirus Engines
    Ilir Gashi, Vladimir Stankovic (City University, London); Corrado Leita, and Olivier Thonnard (Royal Military Academy, Belgium)
    In Proceedings of the 8th IEEE Symposium on Network Computing and Applications, (NCA)

  • Automatic Generation of String Signatures for Malware Detection
    Kent Griffin, Scott Schneider, Xin Hu, and Tzi-cker Chiueh (Symantec Research Labs)
    In Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection (RAID)

  • Behavioral Analysis of Zombie Armies
    Olivier Thonnard (Royal Military Academy, Belgium), Wim Mees (Eurecom), and Marc Dacier (Symantec Research Labs)
    Cyber Warfare Conference (CWCon), Cooperative Cyber Defense Center Of Excellence (CCD-COE)

  • Correlation Between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors
    M. Tatsumi
    Proceedings of MT Summit XII

  • DAFT: Disk Geometry-Aware File System Traversal
    Fanglu Guo and Tzi-cker Chiueh (Symantec Research Labs)
    In Proceedings of the 17th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

  • Fast Memory State Synchronization for Virtualization-based Fault Tolerance
    Maohua Lu (Stony Brook University), Tzi-cker Chiueh (Symantec Research Labs), and Shibiao Lin
    In Proceedings of the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

  • Garbage Collection in the Next C++ Standard
    Mike Spertus (Symantec Research Labs) and Hans-J. Boehm (HP Laboratories)
    In Proceedings of the 2009 International Symposium on Memory Management (ISMM)

  • Guaranteeing Eventual Coherency across Data Copies, in a Highly Available Peer-to-Peer Distributed File System
    Bijayalaxmi Nanda, Anindya Banerjee (Symantec Research Labs), and Navin Kabra (PuneTech.com)
    In Proceedings of the 10th International Conference on Distributed Computing and Networking (ICDCN 2009)

  • Honeypot Traces Forensics: The Observation Viewpoint Matters
    Van-Hau Pham (Institute Eurecom) and Marc Dacier (Symantec Research Labs)
    In Proceedings of the Third International Conference on Network and System Security

  • U Can’t Touch This: Block-Level Protection for Portable Storage
    Kevin R.B. Butler (Pennsylvania State University) and Petros Efstathopoulos (Symantec Research Labs)
    In Proceedings of the 2009 International Workshop on Software Support for Portable Storage, Grenoble, France, October 2009.

    Using secure disks and principles of label persistence from the Asbestos operating system, we propose mechanisms to address these concerns, by making the drive responsible for enforcing data isolation at the block level, and preventing block sharing between hosts that are not considered equally trusted.

2008

  • A Study of the Packer Problem and Its Solutions
    Fanglu Guo, Peter Ferrie, and Tzi-cker Chiueh (Symantec Research Labs)
    In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection (RAID)

  • A System for Generating Static Analyzers for Machine Instructions
    Junghee Lim (University of Wisconsin-Madison) and Thomas Reps (GammaTech, Inc., NY)
    In Proceedings of the International Conference on Compiler Construction (CC)

  • Accurate and Efficient Inter-Transaction Dependency Tracking
    Tzi-cker Chiueh and Shweta Bajpai (Stony Brook University)
    In Proceedings of the 24th International Conference on Data Engineering (ICDE 2008)

  • Actionable Knowledge Discovery for Threats Intelligence Support Using a Multi-dimensional Data Mining Methodology
    Olivier Thonnard (Royal Military Academy, Belgium) and Marc Dacier (Symantec Research Labs)
    In Proceedings of the IEEE Data Mining Workshops, 2008, (ICDMW)

  • An Incremental File System Consistency Checker for Block-Level CDP Systems
    Maohua Lu, Tzi-cker Chiueh (Stony Brook University); and Shibiao Lin (Google Inc.)
    In Proceedings of the IEEE 27th International Symposium on Reliable Distributed Systems (SRDS)

  • Applications of Feather-Weight Virtual Machine
    Yang Yu, Hariharan Kolam Govindarajan, Lap Chung-Lam and Tzi-cker Chiueh (Stony Brook University)
    In Proceedings of the 2008 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE08)

  • Availability and Fairness Support for Storage QoS Guarantee
    Peng Gang and Tzi-cker Chiueh (Stony Brook University)
    In Proceedings of the 28th International Conference on Distributed Computing Systems (ICDCS)

  • Comparison of QoS Guarantee Techniques for VoIP over IEEE802.11 Wireless LAN
    Fanglu Guo and Tzi-cker Chiueh (Stony Brook University)
    In Proceedings of the 15th Annual Multimedia Computing and Networking Conference (MMCN)

  • Detecting Known and New Salting Tricks in Unwanted Emails
    Andre Bergholz, Gerhard Paass, Frank Reichartz, Siehyun Strobel (Fraunhofer IAIS, Germany); Marie-Francine Moens (Katholieke Universitiet, Belgium); and Brian Witten (Symantec Research Labs)
    In Proceedings of International Conference on Email and Anti-Spam (CEAS)

  • Fast Bounds Checking Using Debug Register
    Tzi-cker Chiueh (Stony Brook University)
    In Proceedings of the 3rd International Conference on High Performance and Embedded Architectures and Compilers (HiPEAC)

  • Feldspar: A System for Finding Information by Association
    Duen Horng Chau, Brad Myers, and Andrew Faulring (Carnegie Mellon University)
    In Proceedings of the CHI 2008 Workshop on Personal Information Management (PIM)

  • Graphics Engine Resource Management
    Mikhail Bautin, Ashok Dwarakinath and Tzi-cker Chiueh (Stony Brook University)
    In Proceedings of the 15th Annual Multimedia Computing and Networking Conference (MMCN)

  • GRAPHITE: A Visual Query System for Large Graphs
    Duen Horng Chau, Christos Faloutsos, Hanghang Tong, Jason I. Hong (Carnegie Mellon University); Brian Gallagher and Tina Eliassi-Rad (Lawrence Livermore National Laboratory)
    In Proceedings of the International Conference on Data Mining (ICDM)

  • Noncirculant Toeplitz Matrices All of Whose Powers are Toeplitz
    Kent Griffin, Jeffrey L. Stuart (Pacific Lutheran University), and Michael J. Tsatsomeros (Washington State University)
    Czechoslovak Mathematical Journal (Mathematics and Statistics)

  • RapidUpdate: Peer-Assisted Distribution of Security Content
    Denis Serenyi and Brian Witten (Symantec Research Labs)
    In Proceedings of the 7th International Workshop on Peer-to-Peer Systems (IPTPS)

  • Reducing E-Discovery Cost by Filtering Included Emails
    Tsuen-Wan “Johnny” Ngan
    In Proceedings of the Fifth Conference on Emails and Anti-Spam (CEAS)

  • The eBay graph: How do online auction users interact?
    Yordanos Beyene, Michalis Faloutso (University of California, Riverside), Duen Horng Chau, and Christos Faloutsos (Carnegie Mellon University)
    IEEE Global Internet Symposium, copublished in Proceedings of the 27th Conference on Computer Communication (INFOCOM)

  • What to Do When Search Fails: Finding Information by Association
    Duen Horng Chau, Brad Myers, and Andrew Faulring (Carnegie Mellon University)
    In Proceedings of the Conference on Human Factors in Computing Systems (CHI)

  • Data Space Randomization
    Sandeep Bhatkar (Symantec Research Labs) and R. Sekar (Stony Brook University)
    DIMVA 2008

    In this paper, we introduce a new randomization-based defense against memory error exploits.  More specifically, we show by randomizing the representation of data in the memory, how we get protection against not only code injection attacks but also non-control data attacks.

2007

  • A Forced Sampled Execution Approach to Kernel Rootkit Identification
    Jeffrey Wilhelm and Tzi-cker Chiueh (Symantec Research Labs)
    In Proceedings of the 10th International Symposium on Recent Advances in Intrusion Detection (RAID)

2006

  • Engineering Sufficiently Secure Computing
    Brian Witten (Symantec Research Labs)
    In Proceedings of the 22nd Annual Computer Security Applications Conferences (ACSAC)

  • Malware Evolution: A Snapshot of Threats and Countermeasures in 2005
    Brian Witten and Carey Nachenberg (Symantec Corporation)
    Malware Detection (Advances in Information Security)

1998

  • A Non-Fragmenting Non-Moving, Garbage Collector
    Gustavo Rodriguez-Rivera, Michael Spertus, and Charles Fiterman (Geodesic Systems, now Symantec)
    In Proceedings of the 1st International Symposium on Memory Management (ISMM)