1. Introduction
In the modern digital era, data has become a critical asset for organizations. The ability to analyze and extract insights from data is key to driving business decisions, understanding consumer behavior, and enhancing operational efficiency. There are already numerous studies focusing on data management and mining [1], [30], [33], [24] in the centralized scenario. However, with regulations such as GDPR [38], the landscape of data management and analysis has significantly changed, giving rise to distributed databases characterized by multiple data silos across various organizations, in which the transfer of raw data is typically restricted. Consequently, the exploration of methods for data management, data analysis and data functionality from distributed databases in a privacy-preserving way without the exchange of local data has emerged as a pressing topic.
A typical data distribution in VFL. A classic VFL training process leverages only the aligned data (highlighted in the red square) but discards other data.