Graphical Representation for DNA Sequences via Joint Diagonalization of Matrix Pencil | IEEE Journals & Magazine | IEEE Xplore

Graphical Representation for DNA Sequences via Joint Diagonalization of Matrix Pencil


Abstract:

Graphical representations provide us with a tool allowing visual inspection of the sequences. To visualize and compare different DNA sequences, a novel alignment-free met...Show More

Abstract:

Graphical representations provide us with a tool allowing visual inspection of the sequences. To visualize and compare different DNA sequences, a novel alignment-free method is proposed in this paper for both graphical representation and similarity analysis of sequences. We introduce a transformation to represent each DNA sequence with neighboring nucleotide matrix. Then, based on approximate joint diagonalization theory, we transform each DNA primary sequence into a corresponding eigenvalue vector (EVV), which can be considered as numerical characterization of DNA sequence. Meanwhile, we get graphical representation for DNA sequence via the plot of EVV in 2-D plane. Moreover, using k-means, we cluster these feature curves of sequences into several reasonable subclasses. In addition, similarity analyses are performed by computing the distances among the obtained vectors. This approach contains more sequence information, and it analyzes all the involved sequence information jointly rather than separately. A typical dendrogram constructed by this method demonstrates the effectiveness of our approach.
Published in: IEEE Journal of Biomedical and Health Informatics ( Volume: 17, Issue: 3, May 2013)
Page(s): 503 - 511
Date of Publication: 23 January 2013

ISSN Information:

PubMed ID: 24592449

I. Introduction

Graphical representations of DNA offer visual inspection of DNA sequences [1]. However, in [2], the author investigated corrections that reveal some aspects of similarity which could not be determined through the traditional alignment-based methods. The space of similarity for complex objects is multidimensional. Complex objects may be similar in one aspect; however, it can be very different in another one. Recently, many numerical characterizations for DNA or protein sequences have been introduced, where most of numerical characterizations are extracted from the string representations and graphical representations. The simpler and more important feature from string representations first used for comparison of genome sequence [3] and later for alignment-free comparison of regulatory sequences [4]. Various frequency-based algorithms later have been introduced for sequence comparisons, as indicated in [5] and [6]. Besides the representations based on single nucleotide, the dinucleotide analysis has also been tried by several authors. Randić [7] proposed a condensed representation of DNA based on pairs of nucleotides. Wu et al. [8] proposed the analysis approaches based on neighboring nucleotides of DNA sequence, which reveal the biology information hidden between dual nucleotides.

Contact IEEE to Subscribe

References

References is not available for this document.