ELLIS header
University of Stuttgart Logo
Max Planck Institute for Intelligent Systems Logo

Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study

Yu Nong, Yuzhe Ou, Michael Pradel, Feng Chen, Haipeng Cai

Proceedings of the ACM Symposium on the Foundations of Software Engineering (FSE), pp. 1–13, 2022.


Abstract

The availability of large-scale, realistic vulnerability datasets is essential both for benchmarking existing techniques and for developing effective new data-driven approaches for software security. Yet such datasets are critically lacking. A promising solution is to generate such datasets by injecting vulnerabilities into real-world programs, which are richly available. Thus, in this paper, we explore the feasibility of vulnerability injection through neural code editing. With a synthetic dataset and a real-world one, we investigate the potential and gaps of three state-of-the-art neural code editors for vulnerability injection. We find that the studied editors have critical limitations on the real-world dataset, where the best accuracy is only 10.03%, versus 79.40% on the synthetic dataset. While the graph-based editors are more effective (successfully injecting vulnerabilities in up to 34.93% of real-world testing samples) than the sequence-based one (0 success), they still suffer from complex code structures and fall short for long edits due to their insufficient designs of the preprocessing and deep learning (DL) models. We reveal the promise of neural code editing for generating realistic vulnerable samples, as they help boost the effectiveness of DL-based vulnerability detectors by up to 49.51% in terms of F1 score. We also provide insights into the gaps in current editors (e.g., they are good at deleting but not at replacing code) and actionable suggestions for addressing them (e.g., designing effective editing primitives).

Links


BibTeX

@inproceedings{nong22_fse, title = {Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study}, author = {Nong, Yu and Ou, Yuzhe and Pradel, Michael and Chen, Feng and Cai, Haipeng}, year = {2022}, booktitle = {Proceedings of the ACM Symposium on the Foundations of Software Engineering (FSE)}, pages = {1--13}, preprint = {https://software-lab.org/publications/fse2022_vuln_inj_study.pdf} }