Robust Bayesian Clustering for Replicated Gene Expression Data | IEEE Journals & Magazine | IEEE Xplore

Robust Bayesian Clustering for Replicated Gene Expression Data


Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and t...Show More

Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.
Page(s): 1504 - 1514
Date of Publication: 29 May 2012

ISSN Information:

PubMed ID: 22641714

1 Introduction

Clustering has been an important statistical data analysis tool in many fields. Particularly in computational biology and bioinformatics, clustering methods have been developed and applied extensively. In high throughput biological data sets such as those obtained from transcriptomics analysis, the mRNA levels of tens of thousands of genes are sampled simultaneously under particular experimental conditions. The success of coexpression networks in identifying modules of co-regulated genes (see for example [32], [47]) indicates that genes which show particular response profiles may well share a common function, or be regulated by the same transcription factors. It is therefore of interest to cluster genes on the basis of their response profiles. This gives an overview of the general patterns of gene expression, without getting lost in the sheer number of genes. The importance of clustering analysis to gene expression data has been demonstrated in for example [27].

Contact IEEE to Subscribe

References

References is not available for this document.