1. A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome
- Author
-
Mark Yandell, Sima Misra, Adina M. Bailey, Martha Evans-Holm, Gerald M. Rubin, ShengQiang Shu, Susan E. Celniker, and Colin Wiel
- Subjects
Genetics ,Genome ,Multidisciplinary ,Models, Genetic ,Reverse Transcriptase Polymerase Chain Reaction ,Gene number ,Molecular Sequence Data ,Reproducibility of Results ,Computational gene ,Gene Annotation ,Computational biology ,Biological Sciences ,Biology ,biology.organism_classification ,Polymerase Chain Reaction ,Drosophila melanogaster ,Melanogaster ,Animals ,Drosophila Proteins ,Gene ,Drosophila Protein ,DNA Primers - Abstract
Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions from various sources by sequencing RT-PCR products to confirm gene structures. Our data provide experimental evidence for 122 protein-coding genes. Our analyses suggest that the entire collection of predictions contains only ≈700 additional protein-coding genes. Although we cannot rule out the discovery of genes with unusual features that make them refractory to existing methods, our results suggest that the D. melanogaster genome contains ≈14,000 protein-coding genes.
- Published
- 2005
- Full Text
- View/download PDF