Loading [MathJax]/extensions/MathMenu.js
Chaos Game Representations & Deep Learning for Proteome-Wide Protein Prediction | IEEE Conference Publication | IEEE Xplore

Chaos Game Representations & Deep Learning for Proteome-Wide Protein Prediction


Abstract:

Chaos Game Representation (CGR) is an emerging means of visualising and representing genomic and proteomic sequences. There exist many open questions related to its effec...Show More

Abstract:

Chaos Game Representation (CGR) is an emerging means of visualising and representing genomic and proteomic sequences. There exist many open questions related to its effective application to various computational tasks. In this work, we begin to address some of these questions by comparing four variants of the Chaos Game to generate CGR imagery as part of a multi-class classification task to identify the source organism for a given protein. We propose a novel nodal configuration for icosagon and 20-flake CGRs. Using two datasets, we performed fine-tuning using seven deep convolutional neural network (CNN) architectures and report modest performance over random among the 56 test conditions, highlighting certain shortcomings in effectively leveraging CGR in conjunction with deep CNN architectures. Many of the insights from this work will serve to orient subsequent protein-related studies involving CGR-based encoding and be generally applicable to disparate domains seeking to leverage CGR for sequence-type data.
Date of Conference: 26-28 October 2020
Date Added to IEEE Xplore: 16 December 2020
ISBN Information:

ISSN Information:

Conference Location: Cincinnati, OH, USA

1. Introduction

The task of classifying proteins into various categories is a major challenge in the field of bioinformatics and effective models contribute to greater insight into the fundamentals of molecular biology. Such tasks may involve predicting whether or not any two proteins are likely to interact [1], predicting protein solubility [2], classification of functional families of proteins [3], or predicting what strain of HIV-1 a given protein may belong to [4]. Common to each of these cited examples is the use of an increasingly popular means of encoding a given protein sequence into a machine-readable format: the Chaos Game Representation (CGR).

Contact IEEE to Subscribe

References

References is not available for this document.