Back to Search
Start Over
GENCODE: producing a reference annotation for ENCODE
- Source :
- Recercat. Dipósit de la Recerca de Catalunya, instname, Genome Biology, Vol. 7, No Suppl 1 (2006) pp. S4.1-9, Genome Biology, Scopus-Elsevier, Genome Biology, vol. 7 Suppl 1, pp. S4.1-S4.9
- Publisher :
- BioMed Central
-
Abstract
- Background: The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual/nannotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results./nResults: The GENCODE gene features are divided into eight different categories of which only/nthe first two (known and novel coding sequence) are confidently predicted to be protein-coding/ngenes. 5’ rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally/nverify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been/nsequenced. They supported 5’ extensions of 30 loci and new splice variants in 50 loci. In addition,/n46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15/nputative transcripts. We assessed the comprehensiveness of the GENCODE annotation by/nattempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out/nof 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two/nof them in intergenic regions./nConclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of the/nGENCODE reference set available from the UCSC browser. Comparison of GENCODE/nannotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within/nthe two sets, which is a reflection of the high number of alternative splice forms with unique/nexons annotated. Over 50% of coding loci have been experimentally verified by 5’ RACE for/nEGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human/ngenome with the aid of experimental validation.
- Subjects :
- Bioinformàtica
Humans
Gene Loci
ddc:576.5
RNA, Messenger
ENCODE
Expressed Sequence Tags
Proteins/ genetics
GENCODE
RACE
Genome, Human
Sequence Analysis, RNA
Research
RNA, Messenger/analysis
Computational Biology
Proteins
Chromosome Mapping
Genomics
Sequence Analysis, DNA
Reference Standards
Computational Biology/methods/ standards
Genes
Computational Biology/methods
Computational Biology/standards
Genomics/methods
Genomics/standards
Proteins/genetics
Pseudogenes
Genomics/methods/ standards
Biologia molecular -- Tècnica
cDNA
Subjects
Details
- ISSN :
- 14656906
- Database :
- OpenAIRE
- Journal :
- Recercat. Dipósit de la Recerca de Catalunya, instname, Genome Biology, Vol. 7, No Suppl 1 (2006) pp. S4.1-9, Genome Biology, Scopus-Elsevier, Genome Biology, vol. 7 Suppl 1, pp. S4.1-S4.9
- Accession number :
- edsair.pmid.dedup....3745393b30b90f44281b6aff41954028