Journals & Magazines >IEEE Transactions on Industri... >Volume: 14 Issue: 7

Cross-Project Transfer Representation Learning for Vulnerable Function Discovery

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Machine learning is now widely used to detect security vulnerabilities in the software, even before the software is released. But its potential is often severely compromi...Show More

Metadata

Abstract:

Machine learning is now widely used to detect security vulnerabilities in the software, even before the software is released. But its potential is often severely compromised at the early stage of a software project when we face a shortage of high-quality training data and have to rely on overly generic hand-crafted features. This paper addresses this cold-start problem of machine learning, by learning rich features that generalize across similar projects. To reach an optimal balance between feature-richness and generalizability, we devise a data-driven method including the following innovative ideas. First, the code semantics are revealed through serialized abstract syntax trees (ASTs), with tokens encoded by Continuous Bag-of-Words neural embeddings. Next, the serialized ASTs are fed to a sequential deep learning classifier (Bi-LSTM) to obtain a representation indicative of software vulnerability. Finally, the neural representation obtained from existing software projects is then transferred to the new project to enable early vulnerability detection even with a small set of training labels. To validate this vulnerability detection approach, we manually labeled 457 vulnerable functions and collected 30 000+ nonvulnerable functions from six open-source projects. The empirical results confirmed that the trained model is capable of generating representations that are indicative of program vulnerability and is adaptable across multiple projects. Compared with the traditional code metrics, our transfer-learned representations are more effective for predicting vulnerable functions, both within a project and across multiple projects.

Published in: IEEE Transactions on Industrial Informatics ( Volume: 14, Issue: 7, July 2018)

Page(s): 3289 - 3297

Date of Publication: 02 April 2018

ISSN Information:

DOI: 10.1109/TII.2018.2821768

Contents

References is not available for this document.

MIT Libraries

MIT Libraries

Cross-Project Transfer Representation Learning for Vulnerable Function Discovery

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Cross-Project Transfer Representation Learning for Vulnerable Function Discovery

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References