Conferences >2025 8th International Confer...

Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Automated prompt engineering tools have recently emerged as a promising solution to simplify the traditional man-ual task of crafting prompts for large language models (L...Show More

Metadata

Abstract:

Automated prompt engineering tools have recently emerged as a promising solution to simplify the traditional man-ual task of crafting prompts for large language models (LLMs). This study investigates whether such tools can fully replace manual prompt engineering for code vulnerability detection. We leverage the DSPy (Declarative Self-improving Python) frame-work, which uses modular “signatures” rather than prompts to specify tasks. In DSPy, a signature defines the expected input-output behavior of a task and can optionally include a description to guide the model's objective. This signature is then auto-matically translated into an optimized prompt through DSPy's modules and optimizers. This study compares the performance of GPT-4o-mini on basic and detailed signatures to determine how DSPy optimizations affect model performance in each case. The basic signature prompts the model to classify code as vulnerable or not, while the detailed signature specifies particular vulnerabilities to identify. For each signature, we used DSPy's modules and optimizers to create three prompt configurations: zero-shot (baseline, where the model performs the task without examples), chain-of-thought (where the model shows step-by-step reasoning), and bootstrap few-shot (where the model is guided through a small set of examples). Results show that DSPy's automated optimizations improve performance for both signature types over the zero-shot baseline; however, detection performance increases significantly with detailed signatures. Specifically, the detailed signatures achieved higher F1 scores, with improvements of approximately 23 % for zero-shot, 13 % for chain-of-thought, and 11 % for bootstrap few-shot techniques. These findings indicate that while automated tools enhance prompt efficiency, optimal results are achieved by combining automated techniques with human-crafted signature details, underscoring the ongoing importance of manual refinement in specialized tasks.

Published in: 2025 8th International Conference on Data Science and Machine Learning Applications (CDMA)

Date of Conference: 16-17 February 2025

Date Added to IEEE Xplore: 07 March 2025

ISBN Information:

DOI: 10.1109/CDMA61895.2025.00034

Conference Location: Riyadh, Saudi Arabia

Funding Agency:

Contents

I. Introduction

As the capabilities of large language models (LLMs) ad-vance, they are increasingly adopted across diverse fields. A crucial factor in maximizing the performance of these models lies in prompt engineering, the process of crafting inputs that direct LLMs toward desired responses. Traditionally, this has been a manual iterative process, requiring users to experiment with and fine-tune prompts. However, the growing complexity of tasks and the need for high accuracy have motivated the development of tools that automate aspects of prompt engi-neering to streamline workflows and reduce the dependency on manual input [1].

References is not available for this document.

MIT Libraries

MIT Libraries

Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Manual Prompt Engineering is Not Dead: A Case Study on Large Language Models for Code Vulnerability Detection with DSPy

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?