Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery | IEEE Conference Publication | IEEE Xplore

Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery


Abstract:

Previous work has established both the importance and difficulty of establishing and maintaining adequate software traceability. While it has been shown to support essent...Show More

Abstract:

Previous work has established both the importance and difficulty of establishing and maintaining adequate software traceability. While it has been shown to support essential maintenance and evolution tasks, recovering traceability links between related software artifacts is a time consuming and error prone task. As such, substantial research has been done to reduce this barrier to adoption by at least partially automating traceability link recovery. In particular, recent work has shown that supervised machine learning can be effectively used for automating traceability link recovery, as long as there is sufficient data (i.e., labeled traceability links) to train a classification model. Unfortunately, the amount of data required by these techniques is a serious limitation, given that most software systems rarely have traceability information to begin with. In this paper we address this limitation of previous work and propose an approach based on active learning, which substantially reduces the amount of training data needed by supervised classification approaches for traceability link recovery while maintaining similar performance.
Date of Conference: 29 September 2019 - 04 October 2019
Date Added to IEEE Xplore: 05 December 2019
ISBN Information:

ISSN Information:

Conference Location: Cleveland, OH, USA
No metrics found for this document.

I. Introduction

Software systems are made up of many different sources of information such as source code, bug reports, requirements, use cases, test cases, etc. These different types of software artifacts contain important information about various aspects of the system. Software traceability is a system property that represents the degree to which the relationships between related software artifacts of different types are known and documented. For example, in systems with a high level of software traceability, it is known which code segments implement which use cases, which bugs are related to which features, which features cover which requirements, and so forth. Previous work has shown that the information traceability provides natively supports various software tasks such as program comprehension, bug localization, impact analysis, ensuring test coverage, etc. and leads to more reliable projects with better code and fewer bugs [1]–[3]. Unfortunately, collecting this information is often of secondary concern to the actual development, maintenance, and evolution of the project. Therefore, traceability is often established and updated posthoc, long after artifacts are created or modified. The process of establishing links in this scenario is called traceability link recovery (TLR), and when performed manually it is extremely costly. Further, even if significant resources are invested to establish traceability, it will rapidly degrade as the software system changes due to evolution and maintenance tasks [4]–[7]. Therefore, equally important as establishing traceability is the process of maintaining it as the system changes over time.

Usage
Select a Year
2025

View as

Total usage sinceDec 2019:458
01234JanFebMarAprMayJunJulAugSepOctNovDec230000000000
Year Total:5
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.