Journals & Magazines >IEEE Transactions on Software... >Volume: 38 Issue: 2

Data Mining Techniques for Software Effort Estimation: A Comparative Study

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effor...Show More

Metadata

Abstract:

A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule-based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.

Published in: IEEE Transactions on Software Engineering ( Volume: 38, Issue: 2, March-April 2012)

Page(s): 375 - 397

Date of Publication: 23 June 2011

ISSN Information:

DOI: 10.1109/TSE.2011.55

Contents

1. Introduction

Resource planning is considered a key issue in a production environment. In the context of a software developing company, the different resources are, among others, computing power and personnel. In recent years, computing power has become a subordinate resource for software developing companies as it doubles approximately every 18 months, thereby costing only a fraction compared to the late 1960s. Personnel costs are, however, still an important expense in the budget of software developing companies. In light of this observation, proper planning of personnel effort is a key aspect for these companies. Due to the intangible nature of the product “software,” software developing companies are often faced with problems estimating the effort needed to complete a software project [1]. There has been strong academic interest in this topic, assisting the software developing companies in tackling the difficulties experienced to estimate software development effort [2]. In this field of research, the required effort to develop a new project is estimated based on historical data from previous projects. This information can be used by management to improve the planning of personnel, to make more accurate tendering bids, and to evaluate risk factors [3]. Recently, a number of studies evaluating different techniques have been published. The results of these studies are not univocal and are often highly technique and data set dependent. In this paper, an overview of the existing literature is presented. Furthermore, 13 techniques, representing different kinds of models, are investigated. This selection includes tree/rule-based models (M5 and CART), linear models (ordinary least squares regression with and without various transformations, ridge regression (RiR), and robust regression (RoR)), nonlinear models (MARS, least squares support vector machines, multilayered perceptron neural networks (NN), radial basis function (RBF) networks), and a lazy learning-based approach which does not explicitly construct a prediction model, but instead tries to find the most similar past project. Each technique is applied to nine data sets within the domain of software effort estimation. From a comprehensibility point of view, a more concise model (i.e., a model with less inputs) is preferred. Therefore, the impact of a generic backward input selection approach is assessed.

References is not available for this document.

MIT Libraries

MIT Libraries

Data Mining Techniques for Software Effort Estimation: A Comparative Study

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Data Mining Techniques for Software Effort Estimation: A Comparative Study

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References