Loading [MathJax]/extensions/MathMenu.js
RTCoder: An Approach based on Retrieve-template for Automatic Code Generation | IEEE Conference Publication | IEEE Xplore

RTCoder: An Approach based on Retrieve-template for Automatic Code Generation


Abstract:

Regarding code generation, researchers have recently proposed a retrieve-template-generation approach. This method involves retrieving similar code snippets through a ret...Show More

Abstract:

Regarding code generation, researchers have recently proposed a retrieve-template-generation approach. This method involves retrieving similar code snippets through a retriever and providing them to a generator along with input descriptions. However, since the retrieved similar code can be influenced by various data types, it may lead the model to reference unrelated content, resulting in some discrepancies between the generated code and the target code. To mitigate this bias, we introduce a code generation method based on retrieve-template-generation called RTCoder. Specifically, RTCoder completes code generation through three steps: Retrieve. Using a natural language description, the retriever retrieves several similar code snippets from a corpus. Template. By comparing these similar code snippets, it employs the Rabin-Karp algorithm to extract their common substrings and represents different substrings with spaces, forming a code template. Generator. The generator, based on a specific natural language description and the corresponding code template, automatically generates the concrete target code. We conducted extensive comparative experiments on three datasets and used three widely used evaluation metrics. The experimental results demonstrate that: (1) Compared to mainstream code generation models, RTCoder shows improvements in all three metrics across different datasets. For instance, compared to the state-of-the-art CodeT5 base, the EM value is 5.98%, 3.34%, and 1.67% higher on the three datasets, respectively. (2) Our approach is effective for other models as well. Taking the CodeBLEU score on the Concode dataset as an example, the retrieval-template-based generation method improved by 3.73% and 1.80% compared to direct generation and retrieval-generation methods on the RNN model, respectively.
Date of Conference: 17-21 December 2023
Date Added to IEEE Xplore: 26 March 2024
ISBN Information:

ISSN Information:

Conference Location: Ocean Flower Island, China

I. Introduction

With the continuous growth in the scale and complexity of software, software developers expend a substantial amount of time and effort manually writing source code. This phenomenon presents challenges for the software services industry, necessitating improved tools to assist in code development. The purpose of code generation is to automatically produce source code that aligns with given descriptions based on provided natural language input. Today, deep learning technology has successfully been applied to automatic code generation [1]–[3]. Models based on deep learning take natural language (NL) descriptions as input and produce corresponding source code as output. These models are trained on corpora containing genuine NL-Code pairs. Once trained, the models can autonomously generate code based on new NL descriptions. Currently, the prevailing code generation models encompass: sequence-based models, tree-based models, and pre-trained models. However, this single-step approach, where code can be generated using a generator in a single step, does not always yield a high level of compatibility with the target code.

Contact IEEE to Subscribe

References

References is not available for this document.