Journals & Magazines >IEEE Signal Processing Magazine >Volume: 24 Issue: 1

DNA sequence compression - Based on the normalized maximum likelihood model

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Genomic data provide challenging problems that have been studied in a number of fields such as statistics, signal processing, information theory, and computer science. Th...Show More

Metadata

Abstract:

Genomic data provide challenging problems that have been studied in a number of fields such as statistics, signal processing, information theory, and computer science. This article shows that the methodologies and tools that have been recently developed in these fields for modeling signals and processes appear to be most promising for genomic research

Published in: IEEE Signal Processing Magazine ( Volume: 24, Issue: 1, January 2007)

Page(s): 47 - 53

Date of Publication: 31 January 2007

ISSN Information:

DOI: 10.1109/MSP.2007.273055

References is not available for this document.

Contents

DNA compression DNA Sequences: Random, Independent, OR Dependent?

Genetic data represent the most important source of information about life, where, in an exquisite way, nature encodes information about the proteins forming an organism, about controlling biological pathways, about the variations rendering any two individuals different, and probably about many other things unknown yet. Deoxyribonucleic acid (DNA) is a polymer, formed of two entwined helicoidal chains, also called strands. Each strand is a linked chain of nucleotides adenine (A), cytosine (C), guanine (G), and thymine (T). Ignoring its inherent nontrivial three-dimensional structure, DNA can be represented as a one-dimensional chain of the four nucleotides, which can be written either in terms of the four symbols , or in terms of the numbers , describing the bases.

Select All

S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman, "Basic local alignment search tool", J. Mol. Biol., vol. 215, no. 3, pp. 403-410, 1990.

CrossRef Google Scholar

X. Chen, S. Kwong and M. Li, "A compression algorithm for DNA sequences", IEEE Eng. Med. Biol. Mag., pp. 61-66, July/Aug. 2001.

View Article

Google Scholar

X. Chen, M. Li, B. Ma and J. Tromp, "DNACompress: Fast and effective DNA sequence compression", Bioinformatics, vol. 18, no. 12, pp. 1696-1698, 2002.

CrossRef Google Scholar

S. Grumbach and F. Tahi, "Compression of DNA sequences", Proc. Data Compression Conf. 1993, pp. 340-350.

View Article

Google Scholar

S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences", J. Inform. Process. Manage., vol. 30, no. 6, pp. 875-886, 1994.

CrossRef Google Scholar

A. M. Hauth and D. A. Joseph, "Beyond tandem repeats: Complex pattern structures and distant regions of similarity", Bioinformatics, vol. 18, no. 7, pp. S31-S37, 2002.

CrossRef Google Scholar

D. Holste, I. Grosse, S. Breier, P. Schieg and H. Herzel, "Repeats and correlations in human DNA sequences", Phys. Rev. E, vol. 67, 2003.

CrossRef Google Scholar

G. Korodi and I. Tabus, "An efficient normalized maximum likelihood algorithm for DNA sequence compression", ACM Trans. Inform. Syst., vol. 23, no. 1, pp. 3-34, 2005.

CrossRef Google Scholar

G. Korodi and I. Tabus, "Compression of annotated nucleotide sequences", IEEE/ACM Trans. Computat. Biol. Bioinformatics.

View Article

Google Scholar

10.

B. Ma, J. Tromp and M. Li, "PatternHunter: Faster and more sensitive homology search", Bioinformatics, vol. 18, no. 3, pp. 440-445, 2002.

CrossRef Google Scholar

11.

M. Li, J. Badger, X. Chen, S. Kwong, P. Kearney and H. Zhang, "An information based sequence distance and its application to whole mitochondrial genome phylogeny", Bioinformatics, vol. 17, no. 2, pp. 149-154, 2001.

CrossRef Google Scholar

12.

M. Li, X. Chen, X. Li, B. Ma and P. Vitanyi, "The similarity metric", Proc. 14th Annu. ACM-SIAM Symp. Discrete Algorithms, pp. 863-872, 2003.

View Article

Google Scholar

13.

S. B. Needleman and C. D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins", J. Mol. Biol., vol. 48, no. 3, pp. 443-453, 1970.

CrossRef Google Scholar

14.

W. R. Pearson and D. J. Lipman, "Improved tools for biological sequence comparison", Nat. Academy Sci., vol. 85, pp. 2444-2448, 1988.

CrossRef Google Scholar

15.

J. Rissanen, "Fisher information and stochastic complexity", IEEE Trans. Inform. Theory, vol. 42, no. 1, pp. 40-47, 1996.

View Article

Google Scholar

16.

J. Rissanen, "Strong optimality of the normalized ML models as universal codes and information in data", IEEE Trans. Inform. Theory, vol. 47, no. 5, pp. 1712-1717, 2001.

View Article

Google Scholar

17.

É. Rivals, J. P. Delahaye, M. Dauchet and O. Delgrange, A guaranteed compression scheme for repetitive DNA sequences, 1995.

Google Scholar

18.

I. Tabus and J. Astola, "On the use of MDL principle in gene expression prediction", J. Appl. Signal Processing, vol. 2001, no. 4, pp. 297-303, 2001.

CrossRef Google Scholar

19.

I. Tabus, G. Korodi and J. Rissanen, "DNA sequence compression using the normalized maximum likelihood model for discrete regression", Proc. Data Compression Conf. 2003, pp. 253-262.

View Article

Google Scholar

20.

I. Tabus, J. Rissanen and J. Astola, "Normalized maximum likelihood models for Boolean regression with application to prediction and classification in genomics" in Computational and Statistical Approaches to Genomics, New York:Kluwer Academic, pp. 173-196, 2002.

Google Scholar

21.

I. Tabus, J. Rissanen and J. Astola, "Classification and feature gene selection using the normalized maximum likelihood model for discrete regression", Signal Processing, vol. 83, no. 4, pp. 713-727, 2003.

CrossRef Google Scholar

References is not available for this document.

DNA sequence compression - Based on the normalized maximum likelihood model

Abstract:

Metadata

Abstract:

ISSN Information:

DNA compression DNA Sequences: Random, Independent, OR Dependent?

References

IEEE Account

Purchase Details

Profile Information

Need Help?

DNA sequence compression - Based on the normalized maximum likelihood model

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

DNA compression DNA Sequences: Random, Independent, OR Dependent?

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?