Loading [MathJax]/extensions/MathZoom.js
Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy | IEEE Conference Publication | IEEE Xplore

Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy


Abstract:

Automated prompt engineering tools have recently emerged as a promising solution to simplify the traditional man-ual task of crafting prompts for large language models (L...Show More

Abstract:

Automated prompt engineering tools have recently emerged as a promising solution to simplify the traditional man-ual task of crafting prompts for large language models (LLMs). This study investigates whether such tools can fully replace manual prompt engineering for code vulnerability detection. We leverage the DSPy (Declarative Self-improving Python) frame-work, which uses modular “signatures” rather than prompts to specify tasks. In DSPy, a signature defines the expected input-output behavior of a task and can optionally include a description to guide the model's objective. This signature is then auto-matically translated into an optimized prompt through DSPy's modules and optimizers. This study compares the performance of GPT-4o-mini on basic and detailed signatures to determine how DSPy optimizations affect model performance in each case. The basic signature prompts the model to classify code as vulnerable or not, while the detailed signature specifies particular vulnerabilities to identify. For each signature, we used DSPy's modules and optimizers to create three prompt configurations: zero-shot (baseline, where the model performs the task without examples), chain-of-thought (where the model shows step-by-step reasoning), and bootstrap few-shot (where the model is guided through a small set of examples). Results show that DSPy's automated optimizations improve performance for both signature types over the zero-shot baseline; however, detection performance increases significantly with detailed signatures. Specifically, the detailed signatures achieved higher F1 scores, with improvements of approximately 23 % for zero-shot, 13 % for chain-of-thought, and 11 % for bootstrap few-shot techniques. These findings indicate that while automated tools enhance prompt efficiency, optimal results are achieved by combining automated techniques with human-crafted signature details, underscoring the ongoing importance of manual refinement in specialized tasks.
Date of Conference: 16-17 February 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:
Conference Location: Riyadh, Saudi Arabia

Funding Agency:


I. Introduction

As the capabilities of large language models (LLMs) ad-vance, they are increasingly adopted across diverse fields. A crucial factor in maximizing the performance of these models lies in prompt engineering, the process of crafting inputs that direct LLMs toward desired responses. Traditionally, this has been a manual iterative process, requiring users to experiment with and fine-tune prompts. However, the growing complexity of tasks and the need for high accuracy have motivated the development of tools that automate aspects of prompt engi-neering to streamline workflows and reduce the dependency on manual input [1].

Contact IEEE to Subscribe

References

References is not available for this document.