Loading [MathJax]/extensions/MathMenu.js
CoSS: Leveraging Statement Semantics for Code Summarization | IEEE Journals & Magazine | IEEE Xplore

CoSS: Leveraging Statement Semantics for Code Summarization


Abstract:

Automated code summarization tools allow generating descriptions for code snippets in natural language, which benefits software development and maintenance. Recent studie...Show More

Abstract:

Automated code summarization tools allow generating descriptions for code snippets in natural language, which benefits software development and maintenance. Recent studies demonstrate that the quality of generated summaries can be improved by using additional code representations beyond token sequences. The majority of contemporary approaches mainly focus on extracting code syntactic and structural information from abstract syntax trees (ASTs). However, from the view of macro-structures, it is challenging to identify and capture semantically meaningful features due to fine-grained syntactic nodes involved in ASTs. To fill this gap, we investigate how to learn more code semantics and control flow features from the perspective of code statements. Accordingly, we propose a novel model entitled CoSS for code summarization. CoSS adopts a Transformer-based encoder and a graph attention network-based encoder to capture token-level and statement-level semantics from code token sequence and control flow graph, respectively. Then, after receiving two-level embeddings from encoders, a joint decoder with a multi-head attention mechanism predicts output sequences verbatim. Performance evaluations on Java, Python, and Solidity datasets validate that CoSS outperforms nine state-of-the-art (SOTA) neural code summarization models in effectiveness and is competitive in execution efficiency. Further, the ablation study reveals the contribution of each model component.
Published in: IEEE Transactions on Software Engineering ( Volume: 49, Issue: 6, 01 June 2023)
Page(s): 3472 - 3486
Date of Publication: 13 March 2023

ISSN Information:

Funding Agency:


I. Introduction

Code summarization employs natural sentences to describe the function of a source code snippet briefly [1], [2]. Concise and readable code summaries can be used as high-level comments to help developers understand, reuse and maintain source code, and thus can improve development efficiency significantly. Traditional code summarization is a labour-intensive manual task with a high error rate and difficulty in maintenance [3], [4]. As a result, automatic code summarization prevails and has been considered as a promising alternative [5].

Contact IEEE to Subscribe

References

References is not available for this document.