Back to Search
Start Over
Identification of protein-coding genes in the genome of Vibrio cholerae with more than 98% accuracy using occurrence frequencies of single nucleotides.
- Source :
- European Journal of Biochemistry; Aug2001, Vol. 268 Issue 15, p4261-4268, 8p, 5 Diagrams, 5 Charts
- Publication Year :
- 2001
-
Abstract
- The published sequence of the Vibrio cholerae genome indicates that, in addition to the genes that encode proteins of known and unknown function, there are 1577 ORFs identified as conserved hypothetical or hypothetical gene candidates. Because the annotation is not 100% accurate, it is not known which of the 1577 ORFs are true protein-coding genes. In this paper, an algorithm based on the Z curve method, with sensitivity, specificity and accuracy greater than 98%, is used to solve this problem. Twenty-fold cross-validation tests show that the accuracy of the algorithm is 98.8%. A detailed discussion of the mechanism of the algorithm is also presented. It was found that 172 of the 1577 ORFs are unlikely to be protein-coding genes. The number of protein-coding genes in the V. cholerae genome was re-estimated and found to be ≈ 3716. This result should be of use in microarray analysis of gene expression in the genome, because the cost of preparing chips may be somewhat decreased. A computer program was written to calculate a coding score called VCZ for gene identification in the genome. Coding/noncoding is simply determined by VCZ > 0/VCZ < 0. The program is freely available on request for academic use. [ABSTRACT FROM AUTHOR]
- Subjects :
- GENES
VIBRIO cholerae
IDENTIFICATION
Subjects
Details
- Language :
- English
- ISSN :
- 00142956
- Volume :
- 268
- Issue :
- 15
- Database :
- Complementary Index
- Journal :
- European Journal of Biochemistry
- Publication Type :
- Academic Journal
- Accession number :
- 4937708
- Full Text :
- https://doi.org/10.1046/j.1432-1327.2001.02341.x