Conferences >2021 36th IEEE/ACM Internatio...

Unsupervised Labeling and Extraction of Phrase-based Concepts in Vulnerability Descriptions

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

People usually describe the key characteristics of software vulnerabilities in natural language mixed with domain-specific names and concepts. This textual nature poses a...Show More

Metadata

Abstract:

People usually describe the key characteristics of software vulnerabilities in natural language mixed with domain-specific names and concepts. This textual nature poses a significant challenge for the automatic analysis of vulnerabilities. Automatic extraction of key vulnerability aspects is highly desirable but demands significant effort to manually label data for model training. In this paper, we propose an unsupervised approach to label and extract important vulnerability concepts in textural vulnerability descriptions (TVDs). We focus on three types of phrase-based vulnerability concepts (root cause, attack vector, and impact) as they are much more difficult to label and extract than name- or number-based entities (i.e., vendor, product, and version). Our approach is based on a key observation that the same-type of phrases, no matter how they differ in sentence structures and phrase expressions, usually share syntactically similar paths in the sentence parsing trees. Therefore, we propose two path representations (absolute paths and relative paths) and use an auto-encoder to encode such syntactic similarities. To address the discrete nature of our paths, we enhance traditional Variational Auto-encoder (VAE) with Gumble-Max trick for categorical data distribution, and thus creates a Categorical VAE (CaVAE). In the latent space of absolute and relative paths, we further use FIt-TSNE and clustering techniques to generate clusters of the same-type of concepts. Our evaluation confirms the effectiveness of our CaVAE for encoding path representations and the accuracy of vulnerability concepts in the resulting clusters. In a concept classification task, our unsupervisedly labeled vulnerability concepts outperform the two manually labeled datasets from previous work.

Published in: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

Date of Conference: 15-19 November 2021

Date Added to IEEE Xplore: 20 January 2022

ISBN Information:

ISSN Information:

DOI: 10.1109/ASE51524.2021.9678638

Conference Location: Melbourne, Australia

Funding Agency:

Citations are not available for this document.

Contents

I. Introduction

Software vulnerabilities, once disclosed, can be documented in security databases, such as NVD [1], IBMXForce [2], ExploitDB [3]. People usually describe the key characteristics of a vulnerability in natural languages, such as the examples shown in Fig. 1. Key characteristics often include vulnerable product and versions, product vendor and root cause, attack vector and impact of the vulnerability. Although these vulnerability databases provide rich information about known vulnerabilities, security analysts have to manually identify and extract key information of their interests from textual vulnerability descriptions (TVD). Automatic information extraction is highly desirable to expedite vulnerability analysis and security research, for example, finding all vulnerabilities of a product with certain impact, or establishing traceability links between related vulnerabilities in different databases, or detecting discrepancies between vulnerability reports regarding the same vulnerability created by different people [4]–[11].

Cites in Papers - |

Cites in Papers - IEEE (3)

Select All

Linyi Han, Shidong Pan, Zhenchang Xing, Jiamou Sun, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng, "Do Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-Tail Software Through Feature Inference", IEEE Transactions on Software Engineering, vol.51, no.2, pp.466-483, 2025.

Show Article

Google Scholar

Ziyuan Wang, Xiaoyan Liang, Ruizhong Du, Xin Zhou, "BTVD-BERT: A Bilingual Domain-Adaptation Pre-Trained Model for Textural Vulnerability Descriptions", 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp.3671-3676, 2024.

Show Article

Google Scholar

Jeffy Jahfar Poozhithara, Hazeline U. Asuncion, Brent Lagesse, "Keyword Extraction From Specification Documents for Planning Security Mechanisms", 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp.1661-1673, 2023.

Show Article

Google Scholar

Cites in Papers - Other Publishers (6)

Ziyuan Wang, Xiaoyan Liang, Ruizhong Du, Junfeng Tian, Siyi Zhang, "TVD-BERT: A Domain-Adaptation Pre-trained Model for Textural Vulnerability Descriptions", Advanced Intelligent Computing Technology and Applications, vol.14875, pp.197, 2024.

CrossRef Google Scholar

Susheng Wu, Wenyan Song, Kaifeng Huang, Bihuan Chen, Xin Peng, "Identifying Affected Libraries and Their Ecosystems for Open Source Software Vulnerabilities", 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), pp.1996-2007, 2024.

Full Text: PDF Full Text: HTML Google Scholar

Qindong Li, Wenyi Tang, Xingshu Chen, Song Feng, Lizhi Wang, "Comprehensive vulnerability aspect extraction", Applied Intelligence, 2024.

CrossRef Google Scholar

Shi Li, Yongkang Zhang, "Improving entity linking by combining semantic entity embeddings and cross-attention encoder", Journal of Intelligent & Fuzzy Systems, pp.1, 2023.

CrossRef Google Scholar

Sofonias Yitagesu, Zhenchang Xing, Xiaowang Zhang, Zhiyong Feng, Xiaohong Li, Linyi Han, "Extraction of Phrase-based Concepts in Vulnerability Descriptions through Unsupervised Labeling", ACM Transactions on Software Engineering and Methodology, vol.32, no.5, pp.1, 2023.

CrossRef Google Scholar

Qing Huang, Yanbang Sun, Zhenchang Xing, Min Yu, Xiwei Xu, Qinghua Lu, "API Entity and Relation Joint Extraction from Text via Dynamic Prompt-tuned Language Model", ACM Transactions on Software Engineering and Methodology, 2023.

CrossRef Google Scholar

References is not available for this document.

Unsupervised Labeling and Extraction of Phrase-based Concepts in Vulnerability Descriptions

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Cites in Papers - |

Cites in Papers - IEEE (3)

Cites in Papers - Other Publishers (6)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Unsupervised Labeling and Extraction of Phrase-based Concepts in Vulnerability Descriptions

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

Cites in Papers - IEEE (3) | Other Publishers (6)

Cites in Papers - IEEE (3)

Cites in Papers - Other Publishers (6)

References

Cites in Papers - |