I. Introduction
With the advances in artificial intelligence, the performance of machine learning (ML) models is often tied to the volume and quality of data they are trained on. However, obtaining a large amount of real-world data faces many problems. The collection process can be time-consuming and expensive. In addition, privacy concerns, particularly in sensitive domains like healthcare and finance, severely limit data accessibility. In such a context, some regulations such as the General Data Protection Regulation (GDPR) [1] restrict data acquisition and publication. However, the conflict between the necessity for data and the requirement to protect privacy has created a pressing need for innovative solutions. Besides, real-world datasets often face the problems of imbalance and data scarcity in under-representative classes.