Yufei Han

Dr. Yufei Han is currently working as Senior Principal Researcher at Symantec Research Labs. His research interests include robust learning with imperfect telemetry data, adversarial learning, and privacy-preserving learning, which aim at providing a trusted machine learning service. Before he joined Symantec, he conducted post-doctoral research in French Institute of Research in Computer Science and Automation (INRIA) in Paris from 2010-2014. He got his Ph.D. degree in National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Beijing in 2010.

Selected Academic Papers

  • Collaborative and Privacy-Preserving Machine Teaching via Consensus Optimization
    Yufei Han, Yuzhe Ma, Christopher Gates, Kevin A. Roundy and Yun Shen
    To appear at the 2019 International Joint Conference on Neural Networks (IJCNN 2019)

    In this work, we define a collaborative and privacy-preserving machine teaching paradigm with multiple distributed teachers. The focus is to find strategies to organize distributed agents to jointly select a compact subset of data that can be used to train a global model. The global model should achieve nearly the same performance as if the central learner had access to all the data, but the central learner only has access to the selected subset, and each agent only has access to their own data. The goal of this research is to find good strategies to train global models while giving some control back to agents.

  • Multi-label Learning with Highly Incomplete Data via Collaborative Embedding
    Yufei Han, Guolei Sun, Yun Shen, Xiangliang Zhang
    In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2018)

    We proposed a weakly supervised multi-label learning approach, based on the idea of collaborative embedding. It provides a flexible framework to conduct efficient multi-label classification at both transductive and inductive mode by coupling the process of reconstructing missing features and weak label assignments in a joint optimization framework.

  • RiskTeller: Predicting the Risk of Cyber Incidents
    Leyla Bilge, Yufei Han, Matteo Dell'Amico
    In Proceedings of the 24th ACM Conference on Computer and Communications Security (ACM SIGSAC 2017)

    We present a system, RiskTeller, that can predict to-be-infected machines in an enterprise environment.

  • Mini-Batch Spectral Clustering
    Yufei Han, Maurizio Filippone
    In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN 2017)

    This paper proposes a practical approach to learn spectral clustering based on adaptive stochastic gradient optimization. Crucially, the proposed approach recovers the exact spectrum of Laplacian matrices in the limit of the iterations, and the cost of each iteration is linear in the number of samples. Extensive experimental validation on data sets with up to half a million samples demonstrate its scalability and its ability to outperform state-of-the-art approximate methods to learn spectral clustering for a given computational budget.

  • Predicting Cyber Threats with Virtual Security Products
    Shang-Tse Chen, Yufei Han, Duen Horng Chau, Christopher Gates, Michael Hart, Kevin A. Roundy
    In Proceedings of the 33th Annual Computer Security Applications Conference (ACSAC 2017)

    We set out to predict which security events and incidents a security product would have detected had it been deployed, based on the events produced by other security products that were in place. We discovered that the problem is tractable, and that some security products are much harder to model than others, which makes them more valuable.

  • Marmite: Spreading Malicious File Reputation Through Download Graphs
    Gianluca Stringhini, Yun Shen, Yufei Han, Xiangliang Zhang
    In Proceedings of the 33rd Annual Computer Security Applications Conference (ACSAC 2017)

    We presented Marmite, a system that is able to detect malicious files by leveraging a global download graph and label propagation with Bayesian confidence.

  • Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
    Yufei Han, Yun Shen
    In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI 2016)

    We propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training in-stances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model.