1 Introduction
A Bayesian Network (BN) is a probabilistic model based on a directed acyclic graph. In order to use a Bayesian network for inference or decision making, it must first be constructed using prior knowledge from experts and/or observed data. Most of the work reported in the literature assume that all the observed data are available at a single site. However, there are many scientific and non-scientific applications, where the observed data is distributed among different sites. Cost of data communication between the distributed databases is a significant factor in an increasingly mobile and connected world with a large number of distributed data sources. In this paper, we consider a distributed heterogenous data scenario, where each site has observations corresponding to a subset of the attributes. We assume that there exists a “key” that can; link the observations across sites. A naive approach to. learn a BN from distributed heterogenous data is to transmit all local datasets to a central site, and then: learn a BN from the resulting merged dataset (centralized learning However, limited network bandwidth and/or data security might render this approach infeasible.