Loading [MathJax]/extensions/MathMenu.js
Robust Bayesian Clustering for Replicated Gene Expression Data | IEEE Journals & Magazine | IEEE Xplore

Robust Bayesian Clustering for Replicated Gene Expression Data


Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and t...Show More

Abstract:

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.
Page(s): 1504 - 1514
Date of Publication: 29 May 2012

ISSN Information:

PubMed ID: 22641714
References is not available for this document.

1 Introduction

Clustering has been an important statistical data analysis tool in many fields. Particularly in computational biology and bioinformatics, clustering methods have been developed and applied extensively. In high throughput biological data sets such as those obtained from transcriptomics analysis, the mRNA levels of tens of thousands of genes are sampled simultaneously under particular experimental conditions. The success of coexpression networks in identifying modules of co-regulated genes (see for example [32], [47]) indicates that genes which show particular response profiles may well share a common function, or be regulated by the same transcription factors. It is therefore of interest to cluster genes on the basis of their response profiles. This gives an overview of the general patterns of gene expression, without getting lost in the sheer number of genes. The importance of clustering analysis to gene expression data has been demonstrated in for example [27].

Select All
1.
C. Archambeau, N. Delannay and M. Verleysen, "Robust Probabilistic Projections", Proc. 23rd Int’l Conf. Machine Learning, pp. 33-40, June 2006.
2.
C. Archambeau and M. Verleysen, "Robust Bayesian Clustering", Neural Networks, vol. 20, no. 1, pp. 129-138, Jan. 2007.
3.
S. Asur, D. Ucar and S. Parthasarathy, "An Ensemble Framework for Clustering Protein-Protein Interaction Networks", Bioinformatics, vol. 23, pp. 129-140, 2007.
4.
S. Bandyopadhyay, A. Mukhopadhyay and U. Maulik, "An Improved Algorithm for Clustering Gene Expression Data", Bioinformatics, vol. 23, no. 21, pp. 2859-2865, 2007.
5.
A. Bhattacharya and R.K. De, "Bi-correlation Clustering Algorithm for Determining a Set Of co-regulated Genes", Bioinformatics, vol. 25, pp. 2795-2801, Nov. 2009.
6.
C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
7.
D.M. Blei and M.I. Jordan, "Varational Inference for Dirichlet Process Mixtures", Bayesian Analysis, vol. 1, no. 1, pp. 121-144, 2006.
8.
I. Blilou, J. Xu, M. Wildwater, I. Paponov, R. Heidstra, M. Aida, et al., "The PIN Auxin Efflux Facilitator Network Controls Growth and Patterning in Arabidopsis Roots", Nature, vol. 433, pp. 39-44, 2005.
9.
D.P. Brown, "Efficient Functional Clustering of Protein Sequence Using the Dirichlet Process", Bioinformatics, vol. 24, no. 16, pp. 1765-1771, 2008.
10.
P. Carbonetto, M. King and F. Hamze, "A Stochastic Approximation Method for Inference in Probabilistic Graphical Models", Proc. Neural Information Processing Systems Foundation (NIPS), pp. 216-224, 2009.
11.
G. Celeux, O. Martin and C. Lavergne, "Mixture of Linear Mixed Models for Clustering Gene Expression Profiles from Repeated Microarray Experiments", Statistical Modelling, vol. 5, pp. 243-267, 2005.
12.
D.L. Davies and D.W. Bouldin, "A Cluster Separation Measure", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224-227, Apr. 1979.
13.
D. Dotan-Cohen, S. Kasif and A.A. Melkman, "Seeing the Forest for the Trees: Using the Gene Ontology to Restructure Hierarchical Clustering", Bioinformatics, vol. 25, no. 14, pp. 1789-1795, 2009.
14.
T. Fawcett, "ROC Graphs: Notes and Practical Considerations for Researchers", 2003.
15.
M.A.T. Figueiredo and A.K. Jain, "Unsupervised Learning of Finite Mixture Models", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar. 2002.
16.
C. Fox and S. Roberts, "A Tutorial on Variational Bayesian Inference" in Artificial Intelligence Rev., publish online, June 2011.
17.
C. Fraley and A.E. Raftery, "model-based Clustering Discriminant Analysis and Density Estimation", J. Am. Statistical Assoc., vol. 97, pp. 611-631, 2002.
18.
T.J. Guilfoyle and G. Hagen, "Auxin Response Factors", Current Opinions in Plant Biology, vol. 10, no. 5, pp. 453-460, Oct. 2007.
19.
W.G. Hopkins, "A New View of Statistics", 2011, [online] Available: www.sportsci.org/resource/stats/.
20.
L. Hubert and P. Arabie, "Comparing Partitions", J. Classification, vol. 2, pp. 193-218, 1985.
21.
T.R. Hughes, M.J. Marton, C.J. Jones, A.R. Roberts, R. Stoughton, C.D. Armour, et al., "Functional Discovery via a Compendium of Expression Profiles", Cell, vol. 102, pp. 109-126, 2000.
22.
M.I. Jordan, Z. Ghahramani, T.S. Jaakkola and L.K. Saul, "An Introduction to Variational Methods for Graphical Models", Machine Learning, vol. 37, no. 2, pp. 183-233, Nov. 1999.
23.
S. Kim, J. Kim and K.H. Cho, "Inferring Gene Regulatory Networks from Temporal Expression Profiles Under Time-Delay and Noise", Computational Biology and Chemistry, vol. 31, no. 4, pp. 239-245, 2007.
24.
E.M. Kramer and M.J. Bennett, "Auxin Transport: A Field in Flux", Trends in Plant Science, vol. 11, no. 8, pp. 382-386, 2006.
25.
S.R. Lipsitz, N.M. Laird and D.P. Harrington, "Weighted Least Square Analysis of Repeated Categorical Measurements with Outcomes Subject to Nonresponse", Biometric, vol. 50, pp. 11-24, 1994.
26.
Y. Loewenstein, E. Portugaly, M. Fromer and M. Linial, "Efficient Algorithms for Accurate Hierarchical Clustering of Huge Data Sets: Tackling the Entire Protein Space", Bioinformatics, vol. 24, pp. 141-149, 2010.
27.
M. Mdevedovic, K.Y. Yeung and R.E. Bumgarner, "Bayesian Mixture Model Based Clustering of Replicated Expression Data", Bioinformatics, vol. 8, pp. 1222-1232, 2004.
28.
Y. Okushima, H. Fukaki, M. Onoda, A. Theologis and M. Tasaka, "ARF7 and ARF19 Regulate Lateral Root Formation via Direct Activation of LBD/ASL Genes in Rabidopsis", The Plant Cell, vol. 19, pp. 118-130, 2007.
29.
M.K. Pakhira, S. Bandyopadhyay and U. Maulik, "A Study of Some Fuzzy Cluster Validity Indices Genetic Clustering and Application to Pixel Classification", Fuzzy Set and Systems, vol. 15, no. 2, pp. 191-214, Oct. 2005.
30.
R.K. Pearson, G.E. Gonye and J.S. Schwaber, "Biomedical and Life Sciences" in Outliers in Microarray Data Analysis, Springer, pp. 41-55, 2004.

Contact IEEE to Subscribe

References

References is not available for this document.