I. Introduction
Deoxyribonucleic acid (DNA), the material of heredity in most living organisms, consists of genic and intergenic regions, as shown in Fig. 1. In eukaryotes, genes are further divided into relatively small protein coding segments known as exons, interrupted by noncoding spacers known as introns. In eukaryotes such as human, the intergenic and intronic regions often make up more than 95% of their genomes. Codons (i.e., triplets of possible four types of DNA nucleotides , , , and ) in exons encode 20 amino acids and 3 terminator signals, known as stop codons (i.e., TAA, TAG, and TGA). Initial exons of the genes begin with a start codon “ATG.” Looking from the end of DNA (upstream) to its end (downstream), the exon-to-intron border is known as the donor splice site and consists of a consensus dinucleotide “GT” as the first two nucleotides of the intron, whereas the intron-to-exon border is known as the acceptor splice site, which consists of a consensus dinucleotide “AG” as the last two nucleotides of the intron. The accurate identification of genomic protein coding regions, along with the recognition of other signals and/or regions (shown in Fig. 1) would result in an ideal gene finding and annotation system.