Back to Search Start Over

Annotation Marathon Validates 21,037 Human Genes

Authors :
Satoshi Oota
Marie-Dominique Devignes
Arek Kasprzyk
Inna Dubchak
Wojciech Makalowski
Anthony J. Brookes
Per Unneberg
Susan Bromberg
Naoki Nagata
Matthew I. Bellgard
Yasuyuki Fujii
Vladimir Kuryshev
Tetsuji Otsuki
Yoonsoo Hahn
Andrew J. G. Simpson
Ryuichi Sakate
Hyang-Sook Yoo
Minoru Kanehisa
Yoshiyuki Sakaki
Toshio Ota
Kaoru Fukami-Kobayashi
Tomohiro Yasuda
Janet Kelso
Ze-Guang Han
Paul J. Kersey
Lukas Wagner
Norikazu Yasuda
Ursula Hinz
Rolf Apweiler
Tadashi Imanishi
Yoshio Tateno
Hideo Matsuda
Ranajit Chakraborty
Danielle Thierry-Mieg
Nobuo Nomura
Toshihisa Okido
Elspeth A. Bruford
Sandrine Imbeaud
Hans-Werner Mewes
Toshinori Endo
Motohiko Tanino
Ingo Schupp
Hideki Hanaoka
Alexander Kanapin
Dominique Piatier-Tonneau
Craig A. Gough
Sangsoo Kim
Zhu Chen
Michael Han
Anne Estreicher
Sandro J. de Souza
Ken Nishikawa
Hideki Nagasaki
Masafumi Ohtsubo
Osamu Ohara
Reiko Kikuno
Roberto A. Barrero
Claude Chelala
Aiko Takahashi
Stefan Wiemann
Hiroaki Sakai
Satoshi Fukuchi
Takao Isogai
Eric Eveno
Nobuyoshi Shimizu
Mitiko Go
Charles A. Steward
Laurens G. Wilming
Hideaki Sugawara
Jennifer L. Ashurst
Maria de Fatima Bonaldo
Peter J. Tonellato
Gen Tamiya
Takuro Tamura
Michio Oishi
Shuang-Xi Ren
Toshihisa Takagi
Régine Mariage-Samson
Makiko Suwa
Phillip Hilton
Youla Karavidopoulou
Shuhei Mano
Rajni Nigam
Kei Yura
Todd D. Taylor
Norihiro Okada
John Quackenbush
Mitsuteru Nakao
Osamu Ogasawara
Kouichi Kimura
Yoshihide Hayashizaki
Marvin Stodolsky
Keiichi Nagai
Sumio Sugano
Joseph D. Terwilliger
Jun Mashima
Florence Servant
Yasushi Okazaki
Yoshiyuki Suzuki
Motonori Ota
Shinsei Minoshima
Momoki Hirai
Nicola Mulder
Esther Graudens
Stephen T. Sherry
Eduardo Eyras
Susumu Tanaka
Kanako O. Koyanagi
Katsunaga Sakai
Piero Carninci
Charles Auffray
Kazuho Ikeo
Hiroshi Tanaka
Hidemasa Bono
Vamsi Veeramachaneni
Mika Hirakawa
Shigetaka Sakamoto
Tetsuo Nishikawa
Takashi Gojobori
Yumi Yamaguchi-Kabata
Claire O'Donovan
Shinya Watanabe
Clara Amid
Mary Shimoyama
Mami Suzuki
Erimi Harada
Rie Shiba
Takeshi Itoh
Kousaku Okubo
Hidetoshi Inoko
Lihua Jin
Ian Hopkinson
Chisato Yamasaki
Teruyoshi Hishiki
Libin Jia
Winston Hide
Yutaka Suzuki
Keiichi Homma
Izabela Makalowska
Michael A. Thomas
Marie-Anne Debily
Annemarie Poustka
Satoru Miyazaki
Katsuyuki Hashimoto
Bento Soares
Robert L. Strausberg
Gopal R. Gopinath
Takeya Kasukawa
Boris Lenhard
Bernhard Korn
Christine Couillault
Jun-ichi Takeda
Jean Thierry-Mieg
Yayoi Kaneko
Takashi Makino
Kousuke Hanada
Kenta Nakai
Naruya Saitou
Source :
PLoS Biology, Recercat. Dipósit de la Recerca de Catalunya, instname, PLoS Biology, Vol 2, Iss 6, p e162 (2004)
Publication Year :
2004

Abstract

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.<br />An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community

Details

Language :
English
Database :
OpenAIRE
Journal :
PLoS Biology, Recercat. Dipósit de la Recerca de Catalunya, instname, PLoS Biology, Vol 2, Iss 6, p e162 (2004)
Accession number :
edsair.doi.dedup.....90a49417b7e95359c590802e41306f97