Loading [MathJax]/extensions/MathZoom.js
Quality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approach | IEEE Conference Publication | IEEE Xplore

Quality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approach


Abstract:

The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges ...Show More

Abstract:

The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.
Date of Conference: 22-24 May 2022
Date Added to IEEE Xplore: 13 June 2022
ISBN Information:
Print on Demand(PoD) ISSN: 2574-1926
Conference Location: Pittsburgh, PA, USA

1 INTRODUCTION

Data Science (DS), and in particular Machine Learning (ML), systems are increasingly becoming a used instrument, applied to all application domains and affecting our real life. Such systems can be defined as a set of one or more pipelines (or workflows), which take as input raw (unprocessed) data and returns actionable answers to questions in the form of machine learning models. In this paper, we focus on DS pipelines that leverage on ML, that we call ML pipelines.

Contact IEEE to Subscribe

References

References is not available for this document.