Journals & Magazines >IEEE/ACM Transactions on Comp... >Volume: 20 Issue: 2

AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

When clustering gene expression, it is expected that correlation coefficients of genes in the same clusters are high, and that gene ontology (GO) enrichment analysis of m...Show More

Metadata

Abstract:

When clustering gene expression, it is expected that correlation coefficients of genes in the same clusters are high, and that gene ontology (GO) enrichment analysis of most clusters will be significant. However, existing short-term gene expression clustering algorithms have limitations. To address this problem, we proposed a novel clustering process based on angular features for short-term gene expression. Our method (named AngClust) uses angular features to indicate the change of trend in gene expression levels at two neighboring time points. The changes of angles at multiple time points reflects the change of trend of the overall expression levels. Such changes are used to measure whether the expression trends of different genes are similar. To obtain functionally significant clusters from the clustering results, we evaluated numbers of genes in clusters, average correlation coefficient, fluctuation, and their correlation with GO term enrichment. The efficacy of AngClust outperform two other measures, Euclidean distance (ED) and dynamic time warping of correlation (DTW), on a dataset of yeast gene expression. The ratios of GO and pathway term-enriched of clusters of AngClust is higher than or equal to that of STEM and TMixClust on human, mouse, and yeast time series of gene expression.

Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 20, Issue: 2, 01 March-April 2023)

Page(s): 1574 - 1580

Date of Publication: 19 July 2022

ISSN Information:

PubMed ID: 35853049

DOI: 10.1109/TCBB.2022.3192306

Funding Agency:

Citations are not available for this document.

Contents

1 Introduction

Time series of gene expression are extremely useful for investigating various kinds of biological processes, such as cell reproduction, development, and response to external stimuli [1], [2], [3]. Gene temporal expression data can be roughly divided into two categories: (i) short time series containing several time points (generally three to eight time points) and (ii) long time series with more than eight time points [4]. It was estimated that approximately 80% of the time series data sets of gene expression are short time series [5]. Most algorithms for analyzing time series are based on traditional clustering methods, such as hierarchical clustering, K-means, Bayesian networks self-organizing map and many more [2], [6], [7], [8], [9], [10], [11]. These methods can reveal some biological characteristics but do not consider the temporal nature of time series, as the algorithms generally do not account for temporal autocorrelation between adjacent pairs of time points. Recently, some progress has been reported for clustering time series of gene expression, such as the expression profiles of continuous representation through hidden Markov models [5], but those remain restricted and domain specific only to long time series data sets. For time series data, these state-of-the-art algorithms tend to overfit because of the small number of sampling points.

Cites in Papers - |

Cites in Papers - IEEE (2)

Select All

Yi Li, Yifan Shi, "Hybrid Collaborative Ensemble Clustering for Cancer Gene Expression Data", 2024 IEEE International Conference on Medical Artificial Intelligence (MedAI), pp.153-160, 2024.

Show Article

Google Scholar

Dinesh Karunanidy, Vyshnavi V, M Sai Soumya Reddy, Aravind Balakrishnan, Abdul Jaleel D, M. Sreedevi, "Self-Organized Genetic Algorithm for Enhanced Data Clustering", 2023 6th International Conference on Recent Trends in Advance Computing (ICRTAC), pp.582-587, 2023.

Show Article

Google Scholar

Cites in Papers - Other Publishers (3)

Reza Mortazavi, Elham Enayati, Abdolali Basiri, "Accelerated Sequential Data Clustering", Journal of Classification, 2024.

CrossRef Google Scholar

Shailendra Mishra, , 2023.

CrossRef

Amit Kumar Mishra, Saurav Mallik, Viney Sharma, Shweta Paliwal, Kanad Ray, "Integrated Linear Regression and Random Forest Framework for E-Commerce Price Prediction of Pre-owned Vehicle", Proceedings of Trends in Electronics and Health Informatics, vol.675, pp.107, 2023.

CrossRef Google Scholar

References is not available for this document.

MIT Libraries

MIT Libraries

AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

Cites in Papers - |

Cites in Papers - IEEE (2)

Cites in Papers - Other Publishers (3)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

1 Introduction

Cites in Papers - IEEE (2) | Other Publishers (3)

Cites in Papers - IEEE (2)

Cites in Papers - Other Publishers (3)

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Cites in Papers - |