Journals & Magazines >IEEE/ACM Transactions on Comp... >Volume: 9 Issue: 5

Robust Bayesian Clustering for Replicated Gene Expression Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and t...Show More

Metadata

Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.

Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 9, Issue: 5, Sept.-Oct. 2012)

Page(s): 1504 - 1514

Date of Publication: 29 May 2012

ISSN Information:

PubMed ID: 22641714

DOI: 10.1109/TCBB.2012.85

References is not available for this document.

Contents

1 Introduction

Clustering has been an important statistical data analysis tool in many fields. Particularly in computational biology and bioinformatics, clustering methods have been developed and applied extensively. In high throughput biological data sets such as those obtained from transcriptomics analysis, the mRNA levels of tens of thousands of genes are sampled simultaneously under particular experimental conditions. The success of coexpression networks in identifying modules of co-regulated genes (see for example [32], [47]) indicates that genes which show particular response profiles may well share a common function, or be regulated by the same transcription factors. It is therefore of interest to cluster genes on the basis of their response profiles. This gives an overview of the general patterns of gene expression, without getting lost in the sheer number of genes. The importance of clustering analysis to gene expression data has been demonstrated in for example [27].

Select All

C. Archambeau, N. Delannay and M. Verleysen, "Robust Probabilistic Projections", Proc. 23rd Int’l Conf. Machine Learning, pp. 33-40, June 2006.

Robust Bayesian Clustering for Replicated Gene Expression Data

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1 Introduction

Authors

Figures

References

Citations

Keywords

Metrics

Supplemental Items

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?