Abstract:
Graphical representations provide us with a tool allowing visual inspection of the sequences. To visualize and compare different DNA sequences, a novel alignment-free met...Show MoreMetadata
Abstract:
Graphical representations provide us with a tool allowing visual inspection of the sequences. To visualize and compare different DNA sequences, a novel alignment-free method is proposed in this paper for both graphical representation and similarity analysis of sequences. We introduce a transformation to represent each DNA sequence with neighboring nucleotide matrix. Then, based on approximate joint diagonalization theory, we transform each DNA primary sequence into a corresponding eigenvalue vector (EVV), which can be considered as numerical characterization of DNA sequence. Meanwhile, we get graphical representation for DNA sequence via the plot of EVV in 2-D plane. Moreover, using k-means, we cluster these feature curves of sequences into several reasonable subclasses. In addition, similarity analyses are performed by computing the distances among the obtained vectors. This approach contains more sequence information, and it analyzes all the involved sequence information jointly rather than separately. A typical dendrogram constructed by this method demonstrates the effectiveness of our approach.
Published in: IEEE Journal of Biomedical and Health Informatics ( Volume: 17, Issue: 3, May 2013)
References is not available for this document.
Select All
1.
A. Nandy, M. Harle, and S. C. Basak, "Mathematical descriptors of DNA sequences: Development and applications", ARKIVOC, vol. ix, pp. 211-238, 2006.
2.
D. Bielińska-Wąż, "Graphical and numerical representations of DNA sequences: Statistical aspects of similarity", J. Math. Chem., vol. 49, pp. 2345-2407, 2011.
3.
B. E. Blaisdell, "A measure of the similarity of sets of sequences not requiring sequence alignment", Proc. Natl. Acad. Sci., vol. 83, pp. 5155-5159, 1986.
4.
M. R. Kantorovitz, G. E. Robinson, and S. Sinha, "A statistical method for alignment-free comparison of regulatory sequences", Bioinformatics, vol. 23, pp. i249-i255, 2007.
5.
G. E. Sims, S. R. Jun, G. A. Wu, and S. H. Kim, "Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions", Proc. Natl. Acad. Sci., vol. 106, pp. 2677-2682, 2009.
6.
S. R. Jun, G. E. Sims, G. A. Wu, and S. H. Kim, "Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution", Proc. Natl. Acad. Sci., vol. 107, pp. 133-138, 2009.
7.
M. Randić, "Condensed representation of DNA primary sequences", J. Chem. Inf. Comput. Sci., vol. 40, pp. 50-56, 2000.
8.
Y. Wu, A. W.-C. Liew, H. Yan, and M. Yang, "DB-Curve: A novel 2D method of DNA sequence visualization and representation", Chem. Phys. Lett., vol. 367, pp. 170-176, 2003.
9.
Z. B. Liu, B. Liao, and W. Zhu, "A new method to analyze the similarity based on dual nucleotides of the DNA sequence", MATCH, vol. 61, pp. 541-552, 2009.
10.
Z. Liu, B. Liao, W. Zhu, and G. Huang, "A 2D graphical representation of DNA sequence based on dual nucleotides and its application", Int. J. Quantum Chem., vol. 109, pp. 948-958, 2009.
11.
R. F. Voss, "Evolution of long-rang fractal correlations and 1/f noise in DNAbase sequences", Phys. Rev. Lett., vol. 68, pp. 3805-3808, 1992.
12.
M. Akhtar, J. Epps, and E. Ambikairajah, "On DNA numerical representation for period-3 based exon prediction", Proc. 5th Int. Workshop Genomic Signal Process. Stat., pp. 1-4, 2007.
13.
H. J. Jeffrey, "Chaos game representation of gene structure", Nucleic Acids Res., vol. 18, pp. 2163-2170, 1990.
14.
R. Zhang and C. T. Zhang, "Z curves an intutive tool for visualizing and analyzing the DNA sequences", J. Biomol. Struct. Dyn., vol. 11, pp. 767-782, 1994.
15.
M. Randić, "Another look at the chaos-game representation of DNA", Chem. Phys. Lett., vol. 456, pp. 84-88, 2008.
16.
S. Wang, F. Tian, W. Feng, and X. Liu, "Applications of representation method for DNA sequences based on symbolic dynamics", J. Mol. Struct. THEOCHEM, vol. 909, pp. 33-42, 2009.
17.
A. K. Brodzik and O. Peters, "Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences", Proc. IEEE Int. Conf. Acoust. Speech Signal Process., pp. 373-376, 2005.
18.
P. D. Cristea, "Large scale features in DNA genomic signals", Signal Process., vol. 83, pp. 871-888, 2003.
19.
B. Liao, M. Tan, and K. Ding, "Application of 2-D graphical representation of DNA sequence", Chem. Phys. Lett., vol. 414, pp. 296-300, 2005.
20.
M. Randić, M. Vracko, N. Lers, and D. Plavsic, "Novel 2-D graphical representation of DNA sequences and their numerical characterization", Chem. Phys. Lett., vol. 368, pp. 1-6, 2003.
21.
M. Randić, M. Vracko, N. Lers, and D. Plavsic, "Analysis of similarity/dissimilarity of DNA sequences based on a novel 2-D graphical representation", Chem. Phys. Lett., vol. 371, pp. 202-207, 2003.
22.
J. Song and H. W. Tang, "A new 2-D graphical representation of DNA sequences and their numerical characterization", J. Biochem. Biophys. Methods, vol. 63, pp. 228-239, 2005.
23.
B. Liao and T. M. Wang, "New 2D graphical representations of DNA sequences", J. Comput. Chem., vol. 25, pp. 1364-1368, 2004.
24.
Y.-H. Yao and T.-M. Wang, "A class of new 2-D graphical representation of DNA sequences and their application", Chem. Phys. Lett., vol. 398, pp. 318-323, 2004.
25.
M. Randić, M. Vračko, J. Zupan, and M. Novič, "Compact 2-D graphical representation of DNA", Chem. Phys. Lett., vol. 373, pp. 558-562, 2003.
26.
M. Randić, "Graphical representations of DNA as 2-D map", Chem. Phys. Lett., vol. 386, pp. 468-471, 2004.
27.
L. Yang, X. Zhang, and H. Zhu, "Alignment free comparison: Similarity distribution between the DNA primary sequences based on the shortest absent word", J. Theor. Biol., vol. 295, pp. 125-131, 2012.
28.
B. Liao and T.-M. Wang, "Analysis of similarity/dissimilarity of DNA sequences based on 3-D graphical representation", Chem. Phys. Lett., vol. 388, pp. 195-200, 2004.
29.
W. Wang and D. H. Johnson, "Computing linear transforms of symbolic signals", IEEE Trans. Signal Process., vol. 50, no. 3, pp. 628-634, Mar. 2002.
30.
P. D. Cristea, "Conversion of nucleotide sequences into genomic signals", J. Cell Mol. Med., vol. 6, pp. 279-303, 2002.