Back to Search Start Over

cDNA sequences reveal considerable gene prediction inaccuracy in the Plasmodium falciparum genome

Authors :
Valenzuela Jesus G
Mu Jianbing
Ding Jinhui
Jiang Hongying
Lu Fangli
Ribeiro José MC
Su Xin-zhuan
Source :
BMC Genomics, Vol 8, Iss 1, p 255 (2007)
Publication Year :
2007
Publisher :
BMC, 2007.

Abstract

Abstract Background The completion of the Plasmodium falciparum genome represents a milestone in malaria research. The genome sequence allows for the development of genome-wide approaches such as microarray and proteomics that will greatly facilitate our understanding of the parasite biology and accelerate new drug and vaccine development. Designing and application of these genome-wide assays, however, requires accurate information on gene prediction and genome annotation. Unfortunately, the genes in the parasite genome databases were mostly identified using computer software that could make some erroneous predictions. Results We aimed to obtain cDNA sequences to examine the accuracy of gene prediction in silico. We constructed cDNA libraries from mixed blood stages of P. falciparum parasite using the SMART cDNA library construction technique and generated 17332 high-quality expressed sequence tags (EST), including 2198 from primer-walking experiments. Assembly of our sequence tags produced 2548 contigs and 2671 singletons versus 5220 contigs and 5910 singletons when our EST were assembled with EST in public databases. Comparison of all the assembled EST/contigs with predicted CDS and genomic sequences in the PlasmoDB database identified 356 genes with predicted coding sequences fully covered by EST, including 85 genes (23.6%) with introns incorrectly predicted. Careful automatic software and manual alignments found an additional 308 genes that have introns different from those predicted, with 152 new introns discovered and 182 introns with sizes or locations different from those predicted. Alternative spliced and antisense transcripts were also detected. Matching cDNA to predicted genes also revealed silent chromosomal regions, mostly at subtelomere regions. Conclusion Our data indicated that approximately 24% of the genes in the current databases were predicted incorrectly, although some of these inaccuracies could represent alternatively spliced transcripts, and that more genes than currently predicted have one or more additional introns. It is therefore necessary to annotate the parasite genome with experimental data, although obtaining complete cDNA sequences from this parasite will be a formidable task due to the high AT nature of the genome. This study provides valuable information for genome annotation that will be critical for functional analyses.

Details

Language :
English
ISSN :
14712164
Volume :
8
Issue :
1
Database :
Directory of Open Access Journals
Journal :
BMC Genomics
Publication Type :
Academic Journal
Accession number :
edsdoj.6929038402cd4d4fae6dda9aaa7a8b28
Document Type :
article
Full Text :
https://doi.org/10.1186/1471-2164-8-255