Loading [MathJax]/extensions/MathZoom.js
Code Similarity Detection Based on Siamese Network | IEEE Conference Publication | IEEE Xplore

Code Similarity Detection Based on Siamese Network


Abstract:

At present, with the continuous expansion of software scale, problems such as plagiarism, clone and reuse in software code become increasingly prominent, and the study of...Show More

Abstract:

At present, with the continuous expansion of software scale, problems such as plagiarism, clone and reuse in software code become increasingly prominent, and the study of code similarity plays an important role. The existing studies have problems such as inaccurate representation of code semantic information and insufficient acquisition of word vector feature information. To solve the above problems, this paper proposes a code similarity calculation model based on a deep learning framework. This method firstly expresses the source code semantics. Secondly, it uses the Siamese network to extract semantic feature information. Finally, it utilizes the cosine distance to calculate the similarity of feature vector in high-dimensional space. Experiments have proved that our method has better performance in terms of precision, recall and F1 compared with the baseline method. For this reason, our method can effectively obtain code semantic information and improve the performance of code similarity measurement.
Date of Conference: 19-21 March 2021
Date Added to IEEE Xplore: 23 April 2021
ISBN Information:
Conference Location: Chengdu, China

I. Introduction

The current development of the software industry has caused a rapid increase in the number of software, leading to a continuous increase in the number and scale of existing software codes, and bringing new problems to software maintenance and evolution. Measuring code similarity can effectively use existing code and reduce the consumption of manpower and material resources. Measuring software internal code similarity can detect software code clone [1], plagiarism [2], reuse [3] and other issues. Therefore, it is of great significance to measure the similarity of software code.

Contact IEEE to Subscribe

References

References is not available for this document.