Conferences >2019 IEEE International Confe...

Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Previous work has established both the importance and difficulty of establishing and maintaining adequate software traceability. While it has been shown to support essent...Show More

Metadata

Abstract:

Previous work has established both the importance and difficulty of establishing and maintaining adequate software traceability. While it has been shown to support essential maintenance and evolution tasks, recovering traceability links between related software artifacts is a time consuming and error prone task. As such, substantial research has been done to reduce this barrier to adoption by at least partially automating traceability link recovery. In particular, recent work has shown that supervised machine learning can be effectively used for automating traceability link recovery, as long as there is sufficient data (i.e., labeled traceability links) to train a classification model. Unfortunately, the amount of data required by these techniques is a serious limitation, given that most software systems rarely have traceability information to begin with. In this paper we address this limitation of previous work and propose an approach based on active learning, which substantially reduces the amount of training data needed by supervised classification approaches for traceability link recovery while maintaining similar performance.

Published in: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)

Date of Conference: 29 September 2019 - 04 October 2019

Date Added to IEEE Xplore: 05 December 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/ICSME.2019.00020

Conference Location: Cleveland, OH, USA

Contents

I. Introduction

Software systems are made up of many different sources of information such as source code, bug reports, requirements, use cases, test cases, etc. These different types of software artifacts contain important information about various aspects of the system. Software traceability is a system property that represents the degree to which the relationships between related software artifacts of different types are known and documented. For example, in systems with a high level of software traceability, it is known which code segments implement which use cases, which bugs are related to which features, which features cover which requirements, and so forth. Previous work has shown that the information traceability provides natively supports various software tasks such as program comprehension, bug localization, impact analysis, ensuring test coverage, etc. and leads to more reliable projects with better code and fewer bugs [1]–[3]. Unfortunately, collecting this information is often of secondary concern to the actual development, maintenance, and evolution of the project. Therefore, traceability is often established and updated posthoc, long after artifacts are created or modified. The process of establishing links in this scenario is called traceability link recovery (TLR), and when performed manually it is extremely costly. Further, even if significant resources are invested to establish traceability, it will rapidly degrade as the software system changes due to evolution and maintenance tasks [4]–[7]. Therefore, equally important as establishing traceability is the process of maintaining it as the system changes over time.

References is not available for this document.

MIT Libraries

MIT Libraries

Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Tracing with Less Data: Active Learning for Classification-Based Traceability Link Recovery

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References