Back to Search
Start Over
Self-identification of protein-coding regions in microbial genomes
- Source :
- Proceedings of the National Academy of Sciences. 95:10026-10031
- Publication Year :
- 1998
- Publisher :
- Proceedings of the National Academy of Sciences, 1998.
-
Abstract
- A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as genemark [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123–133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1–2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.
- Subjects :
- DNA, Bacterial
Genetics
Base Composition
Multidisciplinary
Bacteria
Models, Genetic
Phylogenetic tree
Markov chain
Reproducibility of Results
Bacterial genome size
Computational biology
Biological Sciences
Biology
Markov model
Genome
Partition (database)
Markov Chains
genomic DNA
Bacterial Proteins
Genetic Techniques
Algorithms
Genome, Bacterial
Coding (social sciences)
Subjects
Details
- ISSN :
- 10916490 and 00278424
- Volume :
- 95
- Database :
- OpenAIRE
- Journal :
- Proceedings of the National Academy of Sciences
- Accession number :
- edsair.doi.dedup.....d268b01b0357505a7b1cec31e27574f2
- Full Text :
- https://doi.org/10.1073/pnas.95.17.10026