Satoshi Oota, Marie-Dominique Devignes, Arek Kasprzyk, Inna Dubchak, Wojciech Makalowski, Anthony J. Brookes, Per Unneberg, Susan Bromberg, Naoki Nagata, Matthew I. Bellgard, Yasuyuki Fujii, Vladimir Kuryshev, Tetsuji Otsuki, Yoonsoo Hahn, Andrew J. G. Simpson, Ryuichi Sakate, Hyang-Sook Yoo, Minoru Kanehisa, Yoshiyuki Sakaki, Toshio Ota, Kaoru Fukami-Kobayashi, Tomohiro Yasuda, Janet Kelso, Ze-Guang Han, Paul J. Kersey, Lukas Wagner, Norikazu Yasuda, Ursula Hinz, Rolf Apweiler, Tadashi Imanishi, Yoshio Tateno, Hideo Matsuda, Ranajit Chakraborty, Danielle Thierry-Mieg, Nobuo Nomura, Toshihisa Okido, Elspeth A. Bruford, Sandrine Imbeaud, Hans-Werner Mewes, Toshinori Endo, Motohiko Tanino, Ingo Schupp, Hideki Hanaoka, Alexander Kanapin, Dominique Piatier-Tonneau, Craig A. Gough, Sangsoo Kim, Zhu Chen, Michael Han, Anne Estreicher, Sandro J. de Souza, Ken Nishikawa, Hideki Nagasaki, Masafumi Ohtsubo, Osamu Ohara, Reiko Kikuno, Roberto A. Barrero, Claude Chelala, Aiko Takahashi, Stefan Wiemann, Hiroaki Sakai, Satoshi Fukuchi, Takao Isogai, Eric Eveno, Nobuyoshi Shimizu, Mitiko Go, Charles A. Steward, Laurens G. Wilming, Hideaki Sugawara, Jennifer L. Ashurst, Maria de Fatima Bonaldo, Peter J. Tonellato, Gen Tamiya, Takuro Tamura, Michio Oishi, Shuang-Xi Ren, Toshihisa Takagi, Régine Mariage-Samson, Makiko Suwa, Phillip Hilton, Youla Karavidopoulou, Shuhei Mano, Rajni Nigam, Kei Yura, Todd D. Taylor, Norihiro Okada, John Quackenbush, Mitsuteru Nakao, Osamu Ogasawara, Kouichi Kimura, Yoshihide Hayashizaki, Marvin Stodolsky, Keiichi Nagai, Sumio Sugano, Joseph D. Terwilliger, Jun Mashima, Florence Servant, Yasushi Okazaki, Yoshiyuki Suzuki, Motonori Ota, Shinsei Minoshima, Momoki Hirai, Nicola Mulder, Esther Graudens, Stephen T. Sherry, Eduardo Eyras, Susumu Tanaka, Kanako O. Koyanagi, Katsunaga Sakai, Piero Carninci, Charles Auffray, Kazuho Ikeo, Hiroshi Tanaka, Hidemasa Bono, Vamsi Veeramachaneni, Mika Hirakawa, Shigetaka Sakamoto, Tetsuo Nishikawa, Takashi Gojobori, Yumi Yamaguchi-Kabata, Claire O'Donovan, Shinya Watanabe, Clara Amid, Mary Shimoyama, Mami Suzuki, Erimi Harada, Rie Shiba, Takeshi Itoh, Kousaku Okubo, Hidetoshi Inoko, Lihua Jin, Ian Hopkinson, Chisato Yamasaki, Teruyoshi Hishiki, Libin Jia, Winston Hide, Yutaka Suzuki, Keiichi Homma, Izabela Makalowska, Michael A. Thomas, Marie-Anne Debily, Annemarie Poustka, Satoru Miyazaki, Katsuyuki Hashimoto, Bento Soares, Robert L. Strausberg, Gopal R. Gopinath, Takeya Kasukawa, Boris Lenhard, Bernhard Korn, Christine Couillault, Jun-ichi Takeda, Jean Thierry-Mieg, Yayoi Kaneko, Takashi Makino, Kousuke Hanada, Kenta Nakai, and Naruya Saitou
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology., An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community