Abstract:
High-quality data are widely acknowledged to be instrumental in improving machine learning models. This creates the need for quality-aware data valuation (DV), whose goal...Show MoreMetadata
Abstract:
High-quality data are widely acknowledged to be instrumental in improving machine learning models. This creates the need for quality-aware data valuation (DV), whose goal is to quantify the contribution of the data during model training and thus identify high-quality and less high-quality ones. In this paper, we survey recent research efforts in quality-aware DV, with a specific focus on designs that exhibit fairness, consistency, robustness, and efficiency. These attributes enhance their practical applicability in domains like model training, data marketplaces, and emerging federated learning paradigms. Our survey encompasses an exploration of design goals and mainstream approaches in DV, as well as an examination of state-of-the-art optimization techniques. We identify and discuss potential challenges and open problems that need to be addressed further. By providing a comprehensive overview of the current state of DV research, we aim to contribute to a deeper understanding of this field.
Published in: IEEE Network ( Volume: 38, Issue: 5, September 2024)