Back to Search Start Over

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

Authors :
Norihiro Maeda
Takeya Kasukawa
Rieko Oyama
Julian Gough
Martin Frith
Pär G Engström
Boris Lenhard
Rajith N Aturaliya
Serge Batalov
Kirk W Beisel
Carol J Bult
Colin F Fletcher
Alistair R R Forrest
Masaaki Furuno
David Hill
Masayoshi Itoh
Mutsumi Kanamori-Katayama
Shintaro Katayama
Masaru Katoh
Tsugumi Kawashima
John Quackenbush
Timothy Ravasi
Brian Z Ring
Kazuhiro Shibata
Koji Sugiura
Yoichi Takenaka
Rohan D Teasdale
Christine A Wells
Yunxia Zhu
Chikatoshi Kai
Jun Kawai
David A Hume
Piero Carninci
Yoshihide Hayashizaki
Source :
PLoS Genetics, Vol 2, Iss 4, p e62 (2006)
Publication Year :
2006
Publisher :
Public Library of Science (PLoS), 2006.

Abstract

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

Subjects

Subjects :
Genetics
QH426-470

Details

Language :
English
ISSN :
15537390 and 15537404
Volume :
2
Issue :
4
Database :
Directory of Open Access Journals
Journal :
PLoS Genetics
Publication Type :
Academic Journal
Accession number :
edsdoj.715f2767f50e469fbc189398f0eb0f8f
Document Type :
article
Full Text :
https://doi.org/10.1371/journal.pgen.0020062