Back to Search Start Over

GENCODE: Creating a Validated Manually Annotated Geneset for the Whole Human Genome

Authors :
Jen Harrow
Felix Kokocinski
Jane E. Loveland
James G. R. Gilbert
Claire Davidson
E. Hart
Adam Frankish
Michael L. Tress
Bronwen Aken
Rachel A. Harte
M. Kay
Michael F. Lin
Alexandra Bignell
Denise Carvalho-Silva
Mark Diekhans
J. Van Baren
Manolis Kellis
Toby Hunt
If H. A. Barnes
Jessica Vamathevan
Catherine E. Snow
Mark Gerstein
R. Kinsella
J. E. Mudge
S. Donaldson
Tim Hubbard
Laurens G. Wilming
David Lloyd
S. Searle
Roderic Guigó
Michael R. Brent
Source :
Nature Precedings.
Publication Year :
2009
Publisher :
Springer Science and Business Media LLC, 2009.

Abstract

The Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute produced the manually annotated geneset for the Encyclopedia of DNA Elements (ENCODE) pilot project and, as part of the Gencode subgroup, are reprising this role in the scale up to cover the whole human genome. Our manual annotation is checked computationally and validated experimentally. Loci and transcripts predicted to be absent from the initial annotation are identified by comparison with a number of state-of-the-art algorithms for identifying exons, splice sites, transcripts and pseudogenes. Where novel features are confirmed the annotation is updated. Annotated coding transcripts are analysed to assess their coding potential by investigating patterns of conservation within the coding sequence (CDS) and comparing predicted secondary structures of annotated CDSs to similar proteins with solved structures. Annotated coding transcripts are also checked against the current set of human Consensus CDSs (CCDS) to check agreement with other participating centres (EBI, NCBI, & UCSC).An initial round of annotation and analysis of chromosomes 21 and 22 has shown that while HAVANA annotation is both comprehensive and robust, it has benefitted from computational review. 13 novel non-coding loci, 27 novel splice variants and 6 extensions to existing variants were identified, many of which were found using supporting EST/mRNA sequences that were not present at the time of initial annotation. Fewer than 10 annotated CDSs required reclassification, no CCDS sequences required updating and 26 novel pseudogene were added. The annotation of human chromosome 2 is complete and we are currently annotating chromosomes 3 and 7. Data from all members of Gencode is distributed via DAS and is now visible in our Zmap annotation interface, allowing assessment of computational predictions contemporaneous with first-pass gene annotation.

Details

ISSN :
17560357
Database :
OpenAIRE
Journal :
Nature Precedings
Accession number :
edsair.doi...........ef42914c417113daa0f966f5c866b7bb
Full Text :
https://doi.org/10.1038/npre.2009.3155.1