Loading [MathJax]/extensions/MathMenu.js
An Improved Approach to Traceability Recovery Based on Word Embeddings | IEEE Conference Publication | IEEE Xplore

An Improved Approach to Traceability Recovery Based on Word Embeddings


Abstract:

Software traceability recovery, which reconstructs links between software artifacts, has become more and more vital to maintaining a software life cycle with the increase...Show More

Abstract:

Software traceability recovery, which reconstructs links between software artifacts, has become more and more vital to maintaining a software life cycle with the increase of software scale and complexity of software architecture. However, existing approaches mainly rely on information retrieval (IR) techniques. These methods are not very efficient at complex software artifacts which are mixed with multilingual texts, code snippets and proper nouns. Moreover, it is hard to predict new traceability links with existing approaches when requirements are changed or software functions are added, since these methods have not made the most of the final ranked lists. In this paper, we propose a novel approach WELR, based on word embeddings and learning to rank to recover traceability links. We use word embeddings to calculate semantic similarities between software artifacts and bring in query expansion and a weighting strategy during calculation. Different from other work, we leverage learning to rank to build prediction models for traceability links. We conducted experiments on five public datasets and took account of traceability links among different kinds of software artifacts. The results show that our method outperforms the state-of-the-art method that works under the same conditions.
Date of Conference: 04-08 December 2017
Date Added to IEEE Xplore: 05 March 2018
ISBN Information:
Conference Location: Nanjing, China
References is not available for this document.

I. Introduction

Traceability recovery is used to discover relationships between thousands of software artifacts to facilitate the efficient retrieval of relevant information in large-scale industrial projects [1]. Complete and accurate traceability links can ensure each related elements will be considered when changing requirements and ensure every requirement is implemented, therefore traceability recovery play important roles in software maintenance [1], bug localizations [11], [35], [36] and etc. Traditional methods of recovering traceability include building requirement traceability matrices (RTMs), building requirement traceability graphs. However, these methods are difficult to extend and error-prone with the evolution of software [2]. Hence, many researchers put forward approaches to solve this problem with information retrieval (IR) techniques, and these methods are mainly based on text retrieval, e.g. VSM [3]–[5], LSA [3]–[5]. As highlighted in [2], text analysis techniques are used to solve more and more problems in software engineering.

Select All
1.
O.C.Z. Gotel and C.W. Finkelstein, "An analysis of the requirements traceability problem", Proceedings of IEEE International Conference on Requirements Engineering Colorado Springs CO, pp. 94-101, 1994.
2.
D. Port, A. Nikora, J.H. Hayes and L. Huang, "Text Mining Support for Software Requirements: Traceability Assurance", 2011 44th Hawaii International Conference on System Sciences, pp. 1-11, 2011.
3.
D. Falessi, G. Cantone and G. Canfora, "Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques", IEEE Transactions on Software Engineering, vol. 39, no. 1, pp. 18-44, Jan. 2013.
4.
A.D. Lucia, M.D. Penta, R. Oliveto, A. Panichella and S. Panichella, "Improving IR-based Traceability Recovery Using Smoothing Filters", 2011 IEEE 19th International Conference on Program Comprehension, pp. 21-30, 2011.
5.
Capobianco Giovanni et al., "Improving IR-based traceability recovery via noun - based indexing of software artifacts", Jornal of Software: Evolution and Process, vol. 25, no. 7, pp. 743-762, 2013.
6.
Jin Guo, Jinghui Cheng and Jane Cleland-Huang, "Semantically enhanced software traceability using deep learning techniques", Proceedings of the 39th International Conference on Software Engineering, 2017.
7.
Jane Cleland-Huang et al., "Software traceability: trends and future directions" in Proceedings of the on Future of Software Engineering, ACM, 2014.
8.
Mikolov Tomas et al., Efficient estimation of word representations in vector space, 2013, [online] Available: .
9.
Mikolov Tomas et al., "Distributed representations of words and phrases and their compositionality", Advances in neural information processing systems, 2013.
10.
Quoc Le and Tomas Mikolov, "Distributed representations of sentences and documents", Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014.
11.
X. Ye, H. Shen, X. Ma, R. Bunescu and C. Liu, "From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering", 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 404-415, 2016.
12.
S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia and T. Menzies, "Automatic query reformulations for text retrieval in software engineering", 2013 35th International Conference on Software Engineering (ICSE), pp. 842-851, 2013.
13.
Claudio Carpineto and Giovanni Romano, "A survey of automatic query expansion in information retrieval", ACM Computing Surveys (CSUR), vol. 44, no. 1, pp. 1, 2012.
14.
George W. Furnas et al., "The vocabulary problem in human-system communication", Communications of the ACM, vol. 30, no. 11, pp. 964-971, 1987.
15.
E. Pyshkin and V. Klyuev, "A study of measures for document relatedness evaluation", 2012 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 249-256, 2012.
16.
X. Dong, X. Chen, Y. Guan, Z. Yu and S. Li, "An Overview of Learning to Rank for Information Retrieval", 2009 WRI World Congress on Computer Science and Information Engineering, pp. 600-606, 2009.
17.
Severyn Aliaksei and Alessandro Moschitti, "Learning to rank short text pairs with convolutional deep neural networks", Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015.
18.
D. Binkley and D. Lawrie, "Learning to Rank Improves IR in SE", 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 441-445, 2014.
19.
Y. Zou, T. Ye, Y. Lu, J. Mylopoulos and L. Zhang, "Learning to Rank for Question-Oriented Software Text Retrieval (T)", 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1-11, 2015.
20.
Xin Ye, Razvan Bunescu and Chang Liu, "Learning to rank relevant files for bug reports using domain knowledge", Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014.
21.
Haoran Niu, Iman Keivanloo and Ying Zou, "Learning to rank code examples for code search engines", Empirical Software Engineering, vol. 22, no. 1, pp. 259-291, 2017.
22.
Sara Alnofaie, Mohammed Dahab and Mahmoud Kamal, "A Novel Information Retrieval Approach using Query Expansion and Spectral-based", Information retrieval, vol. 7, no. 9, 2016.
23.
Tom Kenter and Maarten De Rijke, "Short text similarity with word embeddings", Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 2015.
24.
S. Lai, K. Liu, S. He and J. Zhao, "How to Generate a Good Word Embedding?", IEEE Intelligent Systems, vol. PP, no. 99, pp. 1-1.
25.
Roy Dwaipayan et al., Using word embeddings for automatic query expansion, 2016, [online] Available: .
26.
Joachims Thorsten, "Optimizing search engines using clickthrough data", Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
27.
G. Xun, X. Jia, V. Gopalakrishnan and A. Zhang, "A Survey on Context Learning", IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 38-56, Jan. 2017.
28.
J. Xuan and M. Monperrus, "Learning to Combine Multiple Ranking Metrics for Fault Localizetion", 2014 IEEE International Conference on Software Maintenance and Evolution, pp. 191-200, 2014.
29.
Zhong Hao et al., "MAPO: Mining and recommending API usage patterns", ECOOP 2009-Object-Oriented Programming, pp. 318-343, 2009.
30.
M. Lormans and A. van Deursen, "Can LSI help reconstructing requirements traceability in design and test?", Conference on Software Maintenance and Reengineering (CSMR'06), pp. 10-56, 2006.

Contact IEEE to Subscribe

References

References is not available for this document.