19 results on '"Alexander Kanapin"'
Search Results
2. Arginine methylation expands the regulatory mechanisms and extends the genomic landscape under E2F control
- Author
-
Anastasia Samsonova, Simon M. Carr, Alexander Kanapin, Wojciech Barczak, Alice Poppy Roworth, Geng Liu, Shonagh Munro, Nicholas B. La Thangue, and Rebecca L. Miller
- Subjects
endocrine system ,Tudor domain ,Computational biology ,Biology ,Arginine ,Methylation ,Cell Line ,03 medical and health sciences ,0302 clinical medicine ,Transcription (biology) ,Endopeptidases ,Humans ,Transcription factor ,Research Articles ,Cancer ,030304 developmental biology ,0303 health sciences ,Multidisciplinary ,Protein arginine methyltransferase 5 ,Alternative splicing ,SciAdv r-articles ,Cell Biology ,Genomics ,Chromatin ,E2F Transcription Factors ,Alternative Splicing ,030220 oncology & carcinogenesis ,RNA splicing ,RNA ,biological phenomena, cell phenomena, and immunity ,Small nuclear RNA ,Research Article - Abstract
Arginine methylation widens the mechanism of control by E2F1 from a transcription factor to a regulator of alternative RNA splicing., E2F is a family of master transcription regulators involved in mediating diverse cell fates. Here, we show that residue-specific arginine methylation (meR) by PRMT5 enables E2F1 to regulate many genes at the level of alternative RNA splicing, rather than through its classical transcription-based mechanism. The p100/TSN tudor domain protein reads the meR mark on chromatin-bound E2F1, allowing snRNA components of the splicing machinery to assemble with E2F1. A large set of RNAs including spliced variants associate with E2F1 by virtue of the methyl mark. By focusing on the deSUMOylase SENP7 gene, which we identified as an E2F target gene, we establish that alternative splicing is functionally important for E2F1 activity. Our results reveal an unexpected consequence of arginine methylation, where reader-writer interplay widens the mechanism of control by E2F1, from transcription factor to regulator of alternative RNA splicing, thereby extending the genomic landscape under E2F1 control.
- Published
- 2019
- Full Text
- View/download PDF
3. Unfixed Endogenous Retroviral Insertions in the Human Population
- Author
-
Robert Belshaw, Emanuele Marchi, Alexander Kanapin, and Gkikas Magiorkinis
- Subjects
Lineage (genetic) ,Immunology ,Population ,Biology ,Microbiology ,Genome ,Virus ,03 medical and health sciences ,0302 clinical medicine ,Virology ,Genetic variation ,Genetic model ,Humans ,education ,030304 developmental biology ,Genetics ,0303 health sciences ,education.field_of_study ,Genome, Human ,Endogenous Retroviruses ,Computational Biology ,Genetic Variation ,High-Throughput Nucleotide Sequencing ,DNA ,3. Good health ,Genetic Diversity and Evolution ,Genetic Loci ,030220 oncology & carcinogenesis ,Insect Science ,Human genome ,Reference genome - Abstract
One lineage of human endogenous retroviruses (HERVs), HERV-K(HML2), is upregulated in many cancers, some autoimmune/inflammatory diseases, and HIV-infected cells. Despite 3 decades of research, it is not known if these viruses play a causal role in disease, and there has been recent interest in whether they can be used as immunotherapy targets. Resolution of both these questions will be helped by an ability to distinguish between the effects of different integrated copies of the virus (loci). Research so far has concentrated on the 20 or so recently integrated loci that, with one exception, are in the human reference genome sequence. However, this viral lineage has been copying in the human population within the last million years, so some loci will inevitably be present in the human population but absent from the reference sequence. We therefore performed the first detailed search for such loci by mining whole-genome sequences generated by next-generation sequencing. We found a total of 17 loci, and the frequency of their presence ranged from only 2 of the 358 individuals examined to over 95% of them. On average, each individual had six loci that are not in the human reference genome sequence. Comparing the number of loci that we found to an expectation derived from a neutral population genetic model suggests that the lineage was copying until at least ∼250,000 years ago. IMPORTANCE About 5% of the human genome sequence is composed of the remains of retroviruses that over millions of years have integrated into the chromosomes of egg and/or sperm precursor cells. There are indications that protein expression of these viruses is higher in some diseases, and we need to know (i) whether these viruses have a role in causing disease and (ii) whether they can be used as immunotherapy targets in some of them. Answering both questions requires a better understanding of how individuals differ in the viruses that they carry. We carried out the first careful search for new viruses in some of the many human genome sequences that are now available thanks to advances in sequencing technology. We also compared the number that we found to a theoretical expectation to see if it is likely that these viruses are still replicating in the human population today.
- Published
- 2014
4. Choice of transcripts and software has a large effect on variant annotation
- Author
-
Peter Donnelly, Manuel A. Rivas, Peter Humburg, Jean-Baptiste Cazier, Davis J. McCarthy, Kyle J. Gaulton, and Alexander Kanapin
- Subjects
Nonsynonymous substitution ,Genetics ,0303 health sciences ,Computer science ,Systems biology ,Research ,Clinical Sciences ,Computational biology ,Human genetics ,DNA sequencing ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,030220 oncology & carcinogenesis ,RefSeq ,False positive paradox ,Ensembl ,Molecular Medicine ,Genetics(clinical) ,Molecular Biology ,Genetics (clinical) ,030304 developmental biology - Abstract
Background Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail. Methods This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl’s Variant Effect Predictor), when using Ensembl transcripts. Results We found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies. Conclusions Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.
- Published
- 2016
5. Reactome knowledgebase of human biological pathways and processes
- Author
-
David B. Croft, Ewan Birney, Esther Schmidt, Alexander Kanapin, Henning Hermjakob, Lisa Matthews, Guanming Wu, Lincoln Stein, Imre Vastrik, Shahana S. Mahajan, Marc Gillespie, Phani V. Garapati, Peter D'Eustachio, Bijay Jassal, Michael Caudy, Bruce May, Suzanna E. Lewis, Gopal Gopinath, Bernard de Bono, and Jill Hemish
- Subjects
animal structures ,Computational biology ,Biology ,Bioinformatics ,Biological pathway ,03 medical and health sciences ,0302 clinical medicine ,Prediction methods ,Genetics ,Animals ,Humans ,Data content ,Databases, Protein ,Human proteins ,Physiological Phenomena ,Physiological Phenomenon ,030304 developmental biology ,0303 health sciences ,Extramural ,Proteins ,Articles ,Systems Integration ,ComputingMethodologies_PATTERNRECOGNITION ,Open source ,030220 oncology & carcinogenesis ,Proteins metabolism ,Models, Animal ,Metabolic Networks and Pathways ,Software ,Signal Transduction - Abstract
Reactome (http://www.reactome.org) is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways that functions as a data mining resource and electronic textbook. Its current release includes 2975 human proteins, 2907 reactions and 4455 literature citations. A new entity-level pathway viewer and improved search and data mining tools facilitate searching and visualizing pathway data and the analysis of user-supplied high-throughput data sets. Reactome has increased its utility to the model organism communities with improved orthology prediction methods allowing pathway inference for 22 species and through collaborations to create manually curated Reactome pathway datasets for species including Arabidopsis, Oryza sativa (rice), Drosophila and Gallus gallus (chicken). Reactome's data content and software can all be freely used and redistributed under open source terms.
- Published
- 2009
- Full Text
- View/download PDF
6. Development and Evaluation of an Automated Annotation Pipeline and cDNA Annotation System
- Author
-
Julian Gough, Yoshihide Hayashizaki, Yasushi Okazaki, John Quackenbush, Carol J. Bult, David A. Hume, Masaaki Furuno, Takeya Kasukawa, Alexander Kanapin, Hidemasa Bono, Lynn M. Schriml, Itoshi Nikaido, David P. Hill, Richard M. Baldarelli, and Hideo Matsuda
- Subjects
Quality Control ,DNA, Complementary ,Information Management ,Biology ,Filter (higher-order function) ,computer.software_genre ,Contig Mapping ,Mice ,User-Computer Interface ,Annotation ,Software Design ,Complementary DNA ,Databases, Genetic ,Controlled vocabulary ,Genetics ,Animals ,Humans ,Genetics (clinical) ,Electronic Data Processing ,business.industry ,Automatic Data Processing ,Computational Biology ,Reference Standards ,Pipeline (software) ,Resources ,Gene nomenclature ,Genes ,Functional annotation ,Software design ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Manual curation has long been held to be the “gold standard” for functional annotation of DNA sequence. Our experience with the annotation of more than 20,000 full-length cDNA sequences revealed problems with this approach, including inaccurate and inconsistent assignment of gene names, as well as many good assignments that were difficult to reproduce using only computational methods. For the FANTOM2 annotation of more than 60,000 cDNA clones, we developed a number of methods and tools to circumvent some of these problems, including an automated annotation pipeline that provides high-quality preliminary annotation for each sequence by introducing an “uninformative filter” that eliminates uninformative annotations, controlled vocabularies to accurately reflect both the functional assignments and the evidence supporting them, and a highly refined, Web-based manual annotation tool that allows users to view a wide array of sequence analyses and to assign gene names and putative functions using a consistent nomenclature. The ultimate utility of our approach is reflected in the low rate of reassignment of automated assignments by manual curation. Based on these results, we propose a new standard for large-scale annotation, in which the initial automated annotations are manually investigated and then computational methods are iteratively modified and improved based on the results of manual curation.
- Published
- 2003
- Full Text
- View/download PDF
7. Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species
- Author
-
Simon Cawley, John W. Barnwell, Charles A. Yowell, Esmeralda Vargas-Serrato, Alexander Kanapin, Winston Hide, Jonathan R. Pritt, Nicola Mulder, John B. Dame, Michelle R. Fluegge, Kenneth A. Sturrock, Mary R. Galinski, Ralhston Muller, and Jane M. Carlton
- Subjects
InterPro ,Plasmodium ,DNA, Complementary ,Proteome ,Plasmodium berghei ,Molecular Sequence Data ,Plasmodium falciparum ,Plasmodium vivax ,Protozoan Proteins ,Biology ,Genome ,parasitic diseases ,Animals ,Humans ,Molecular Biology ,Comparative genomics ,Genetics ,Expressed sequence tag ,Computational Biology ,Genomics ,Sequence Analysis, DNA ,biology.organism_classification ,Malaria ,GenBank ,Parasitology ,Databases, Nucleic Acid ,Apicomplexa ,Genome, Protozoan - Abstract
We have undertaken the first comparative pilot gene discovery analysis of approximately 25,000 random genomic and expressed sequence tags (ESTs) from three species of Plasmodium, the infectious agent that causes malaria. A total of 5482 genome survey sequences (GSSs) and 5582 ESTs were generated from mung bean nuclease (MBN) and cDNA libraries, respectively, of the ANKA line of the rodent malaria parasite Plasmodium berghei, and 10,874 GSSs generated from MBN libraries of the Salvador I and Belem lines of Plasmodium vivax, the most geographically wide-spread human malaria pathogen. These tags, together with 2438 Plasmodium falciparum sequences present in GenBank, were used to perform first-pass assembly and transcript reconstruction, and non-redundant consensus sequence datasets created. The datasets were compared against public protein databases and more than 1000 putative new Plasmodium proteins identified based on sequence similarity. Homologs of previously characterized Plasmodium genes were also identified, increasing the number of P. vivax and P. berghei sequences in public databases at least 10-fold. Comparative studies with other species of Apicomplexa identified interesting homologs of possible therapeutic or diagnostic value. A gene prediction program, Phat, was used to predict probable open reading frames for proteins in all three datasets. Predicted and non-redundant BLAST-matched proteins were submitted to InterPro, an integrated database of protein domains, signatures and families, for functional classification. Thus a partial predicted proteome was created for each species. This first comparative analysis of Plasmodium protein coding sequences represents a valuable resource for further studies on the biology of this important pathogen.
- Published
- 2001
- Full Text
- View/download PDF
8. [Untitled]
- Author
-
V A Ivanov, O. V. Godukhin, S. V. Den'mukhametova, Yu. V. Il’in, Alexander Kanapin, and F. F. Kokaeva
- Subjects
Text mining ,Long-term memory ,business.industry ,Biophysics ,General Chemistry ,General Medicine ,Computational biology ,Line (text file) ,Biology ,Reverse transcriptase gene ,business ,Biochemistry - Published
- 2002
- Full Text
- View/download PDF
9. Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity
- Author
-
Vladimir A. Kuznetsov, Nicola Mulder, Alexander Kanapin, Institute of Infectious Disease and Molecular Medicine, and Faculty of Health Sciences
- Subjects
InterPro ,lcsh:QH426-470 ,Proteome ,Transcription, Genetic ,lcsh:Biotechnology ,Gene regulatory network ,Computational biology ,Biology ,Proteomics ,Evolution, Molecular ,lcsh:TP248.13-248.65 ,Genetics ,Animals ,Humans ,Gene Regulatory Networks ,Databases, Protein ,Organism ,Research ,Alternative splicing ,Exons ,lcsh:Genetics ,Alternative Splicing ,UniProt ,DNA microarray ,Algorithms ,Biotechnology - Abstract
We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions. Results We use the InterPro and UniProt databases to attribute descriptive features (keywords) to protein sequences. UniProt database includes a controlled and curated vocabulary of specific descriptors or keywords. The keywords have been assigned to a protein sequence via conserved domains or via similarity with annotated sequences. Then we consider the unique combinations of keywords as the protein functional labels (FL), which characterize the biological functions of the given protein and construct the contingency tables and graphs providing the projections of transcription units (TU) and alternative splice-variants (SV) onto all FL of the proteome of a given organism. We constructed SFNs for organisms with different evolutionary history and levels of complexity, and performed detailed statistical parameterization of the networks. Conclusions The application of the algorithm to organisms with different evolutionary history and level of biological complexity (nematode, fruit fly, vertebrata) reveals that the parameters describing SFN correlate with the complexity of a given organism. Using statistical analysis of the links of the functional networks, we propose new features of evolution of protein function acquisition. We reveal a group of genes and corresponding functions, which could be attributed to an early conservative part of the cellular machinery essential for cell viability and survival. We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms. Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated. The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.
- Published
- 2010
10. Proteome Complexity Measures Based on Counting of Domain-to-Protein Links for Replicative and Non-Replicative Domains
- Author
-
Alexander Kanapin, Vladimir A. Kuznetsov, and V. V. Pickalov
- Subjects
Genetics ,InterPro ,Set (abstract data type) ,Range (mathematics) ,Protein domain ,Proteome ,Probability distribution ,Computational biology ,Biology ,Genome ,Function (biology) - Abstract
The entire protein domain set of the proteome of an organism we call the domainome. We define the list of domains in domainome, together with the numbers of their occurrences (links to proteins) found in the proteome to be the domain-to-protein linkage profile (DPLP). We estimated the DPLP of the proteomes of the 156 complete genomes represented in the InterPro database. This work presents several quantitative measures of the complexity of a proteome based on the DPLP. For each of the 156 studied genomes, we found two large sets of domains: D1, the domains that are not replicated within any protein of the proteome and D2, the domains that occur two or more times in at least one protein of the proteome. Statistics of the observed domain-to-protein links (DPLs) for set D1 and set D2 do not exhibit simple ‘scale-free network’ properties: for D1, the distribution of DPLs in proteome follows the Generalized Discrete Pareto function and for D2, the distribution of DPLs in proteome follows the inversed gamma probability function. Dynamical range of DPLs for D1 domains is larger than for D2 domains, and this range correlates with biological complexity of organism. D1 and D2 sets exhibit significant differences of molecular functions of the corresponding proteins, biological processes, and cellular components. The statistical distributions of the number of DPLs in the proteome and the estimates of the differences between the DPLPs for pairs of organisms are used as measures of relative biological complexity of the organisms. In particular, we show quantitatively the greater domain composition complexity of the human proteins relative to that of a mouse or a rat.
- Published
- 2006
- Full Text
- View/download PDF
11. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes
- Author
-
Paul J. Kersey, Nadeem Faruque, Lorna Morris, Karine Michoud, Tamara Kulikova, Peter McLaren, Alexandre Gattiker, Lawrence Bower, Alexander Kanapin, Carola Kanz, Ingmar Reuter, Robert Petryszak, Alan Horne, Laurent Duret, Isabelle Phan, Rolf Apweiler, Britt Reimholz, Ujjwal Das, Karyn Duggan, Simon Penel, Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), and CCIN2P3
- Subjects
DNA, Bacterial ,Proteomics ,InterPro ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Genomics ,Computational biology ,Biology ,Genome ,DNA sequencing ,User-Computer Interface ,03 medical and health sciences ,Annotation ,Databases, Genetic ,Genetics ,Organism ,MESH: Databases, Genetic ,030304 developmental biology ,MESH: User-Computer Interface ,Internet ,0303 health sciences ,MESH: DNA, Archaeal ,MESH: Genomics ,MESH: Proteomics ,030302 biochemistry & molecular biology ,Articles ,MESH: DNA, Bacterial ,Systems Integration ,DNA, Archaeal ,MESH: Internet ,Proteome ,MESH: Systems Integration - Abstract
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
- Published
- 2005
- Full Text
- View/download PDF
12. InterPro, progress and status in 2005
- Author
-
Ivica Letunic, Jeremy D. Selengut, Paul Bradley, Alex L. Mitchell, Ujjwal Das, David Binns, Julian Gough, John Maslen, Teresa K. Attwood, Robert M. Vaughan, Cathy H. Wu, Christian J. A. Sigrist, David J. Studholme, Anastasia N. Nikolskaya, Rodrigo Lopez, Martin Madera, Emmanuel Courcelle, Daniel H. Haft, Nicola Harte, Alexander Kanapin, Marco Pagni, Maria Krestyaninova, Rolf Apweiler, Nicolas Hulo, Richard R. Copley, Sandra Orchard, David M. Lonsdale, Chris P. Ponting, Alex Bateman, Lorenzo Cerutti, Amos Marc Bairoch, Daniel Kahn, Richard Durbin, Phillip Bucher, Peer Bork, Jennifer McDowall, Nicola Mulder, Wolfgang Fleischmann, Emmanuel Quevillon, Ville Silventoinen, Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), and MDC Library
- Subjects
InterPro ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Architecture domain ,Protein family ,Simple Modular Architecture Research Tool ,Protein Sequence Analysis ,Databases, Protein/trends ,570 Life Sciences ,Computational biology ,PROSITE ,Biology ,Bioinformatics ,610 Medical Sciences, Medicine ,03 medical and health sciences ,Annotation ,TIGRFAMs ,Sequence Analysis, Protein ,Genetics ,Humans ,Protein Databases ,ddc:576 ,Tertiary Protein Structure ,Databases, Protein ,030304 developmental biology ,Proteins/chemistry/classification ,0303 health sciences ,030302 biochemistry & molecular biology ,Proteins ,Articles ,Protein Structure, Tertiary ,Systems Integration ,Cardiovascular and Metabolic Diseases ,UniProt ,Sequence Alignment ,Autre (Sciences du Vivant) - Abstract
International audience; InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
- Published
- 2005
- Full Text
- View/download PDF
13. Annotation Marathon Validates 21,037 Human Genes
- Author
-
Satoshi Oota, Marie-Dominique Devignes, Arek Kasprzyk, Inna Dubchak, Wojciech Makalowski, Anthony J. Brookes, Per Unneberg, Susan Bromberg, Naoki Nagata, Matthew I. Bellgard, Yasuyuki Fujii, Vladimir Kuryshev, Tetsuji Otsuki, Yoonsoo Hahn, Andrew J. G. Simpson, Ryuichi Sakate, Hyang-Sook Yoo, Minoru Kanehisa, Yoshiyuki Sakaki, Toshio Ota, Kaoru Fukami-Kobayashi, Tomohiro Yasuda, Janet Kelso, Ze-Guang Han, Paul J. Kersey, Lukas Wagner, Norikazu Yasuda, Ursula Hinz, Rolf Apweiler, Tadashi Imanishi, Yoshio Tateno, Hideo Matsuda, Ranajit Chakraborty, Danielle Thierry-Mieg, Nobuo Nomura, Toshihisa Okido, Elspeth A. Bruford, Sandrine Imbeaud, Hans-Werner Mewes, Toshinori Endo, Motohiko Tanino, Ingo Schupp, Hideki Hanaoka, Alexander Kanapin, Dominique Piatier-Tonneau, Craig A. Gough, Sangsoo Kim, Zhu Chen, Michael Han, Anne Estreicher, Sandro J. de Souza, Ken Nishikawa, Hideki Nagasaki, Masafumi Ohtsubo, Osamu Ohara, Reiko Kikuno, Roberto A. Barrero, Claude Chelala, Aiko Takahashi, Stefan Wiemann, Hiroaki Sakai, Satoshi Fukuchi, Takao Isogai, Eric Eveno, Nobuyoshi Shimizu, Mitiko Go, Charles A. Steward, Laurens G. Wilming, Hideaki Sugawara, Jennifer L. Ashurst, Maria de Fatima Bonaldo, Peter J. Tonellato, Gen Tamiya, Takuro Tamura, Michio Oishi, Shuang-Xi Ren, Toshihisa Takagi, Régine Mariage-Samson, Makiko Suwa, Phillip Hilton, Youla Karavidopoulou, Shuhei Mano, Rajni Nigam, Kei Yura, Todd D. Taylor, Norihiro Okada, John Quackenbush, Mitsuteru Nakao, Osamu Ogasawara, Kouichi Kimura, Yoshihide Hayashizaki, Marvin Stodolsky, Keiichi Nagai, Sumio Sugano, Joseph D. Terwilliger, Jun Mashima, Florence Servant, Yasushi Okazaki, Yoshiyuki Suzuki, Motonori Ota, Shinsei Minoshima, Momoki Hirai, Nicola Mulder, Esther Graudens, Stephen T. Sherry, Eduardo Eyras, Susumu Tanaka, Kanako O. Koyanagi, Katsunaga Sakai, Piero Carninci, Charles Auffray, Kazuho Ikeo, Hiroshi Tanaka, Hidemasa Bono, Vamsi Veeramachaneni, Mika Hirakawa, Shigetaka Sakamoto, Tetsuo Nishikawa, Takashi Gojobori, Yumi Yamaguchi-Kabata, Claire O'Donovan, Shinya Watanabe, Clara Amid, Mary Shimoyama, Mami Suzuki, Erimi Harada, Rie Shiba, Takeshi Itoh, Kousaku Okubo, Hidetoshi Inoko, Lihua Jin, Ian Hopkinson, Chisato Yamasaki, Teruyoshi Hishiki, Libin Jia, Winston Hide, Yutaka Suzuki, Keiichi Homma, Izabela Makalowska, Michael A. Thomas, Marie-Anne Debily, Annemarie Poustka, Satoru Miyazaki, Katsuyuki Hashimoto, Bento Soares, Robert L. Strausberg, Gopal R. Gopinath, Takeya Kasukawa, Boris Lenhard, Bernhard Korn, Christine Couillault, Jun-ichi Takeda, Jean Thierry-Mieg, Yayoi Kaneko, Takashi Makino, Kousuke Hanada, Kenta Nakai, and Naruya Saitou
- Subjects
DNA, Complementary ,Sequence analysis ,QH301-705.5 ,Gene prediction ,ADN ,Biology ,Genetics/Genomics/Gene Therapy ,Polymorphism, Single Nucleotide ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Open Reading Frames ,03 medical and health sciences ,0302 clinical medicine ,Gene mapping ,Homo (Human) ,Databases, Genetic ,Gene cluster ,Humans ,Bioinformatics/Computational Biology ,Biology (General) ,Gene ,030304 developmental biology ,Genetics ,Internet ,0303 health sciences ,Polymorphism, Genetic ,General Immunology and Microbiology ,Genome, Human ,General Neuroscience ,Alternative splicing ,Computational Biology ,Protein Structure, Tertiary ,Alternative Splicing ,Genes ,030220 oncology & carcinogenesis ,Human genome ,General Agricultural and Biological Sciences ,Microsatellite Repeats ,Research Article ,Gens - Abstract
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology., An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community
- Published
- 2004
14. Mouse Proteome Analysis
- Author
-
Rohan D. Teasdale, Melissa J. Davis, Julian Gough, Gsl Members, Serge Batalov, Alexander Kanapin, Zheng Yuan, Michele Magrane, Sean M. Grimmond, Hideo Matsuda, Christian Schönbach, and Hideya Kawaji
- Subjects
InterPro ,Saccharomyces cerevisiae Proteins ,Proteome ,Protein domain ,Genomics ,Computational biology ,Biology ,Protein Sorting Signals ,Endoplasmic Reticulum ,Transcriptome ,Evolution, Molecular ,Mice ,Protein sequencing ,Protein structure ,Predictive Value of Tests ,Databases, Genetic ,Genetics ,Animals ,Drosophila Proteins ,Humans ,Letters ,Caenorhabditis elegans Proteins ,Genetics (clinical) ,International Protein Index ,Arabidopsis Proteins ,Membrane Proteins ,Protein Structure, Tertiary - Abstract
A general overview of the protein sequence set for the mouse transcriptome produced during the FANTOM2 sequencing project is presented here. We applied different algorithms to characterize protein sequences derived from a nonredundant representative protein set (RPS) and a variant protein set (VPS) of the mouse transcriptome. The functional characterization and assignment of Gene Ontology terms was done by analysis of the proteome using InterPro. The Superfamily database analyses gave a detailed structural classification according to SCOP and provide additional evidence for the functional characterization of the proteome data. The MDS database analysis revealed new domains which are not presented in existing protein domain databases. Thus the transcriptome gives us a unique source of data for the detection of new functional groups. The data obtained for the RPS and VPS sets facilitated the comparison of different patterns of protein expression. A comparison of other existing mouse and human protein sequence sets (e.g., the International Protein Index) demonstrates the common patterns in mammalian proteomes. The analysis of the membrane organization within the transcriptome of multiple eukaryotes provides valuable statistics about the distribution of secretory and transmembrane proteins
- Published
- 2003
15. Applications of InterPro in protein annotation and genome analysis
- Author
-
Margaret Biswas, Paul J. Kersey, Gillian M. Fraser, Virginie Mittard, Joseph F. O'Rourke, Isabelle Phan, Youla Karavidopoulou, Alexander Kanapin, Nicola Mulder, Florence Servant, Evgenia V. Kriventseva, Evelyn Camon, and Rolf Apweiler
- Subjects
InterPro ,Proteome ,Sequence analysis ,Simple Modular Architecture Research Tool ,Protein Conformation ,Computational biology ,Biology ,computer.software_genre ,Genome ,Annotation ,Protein Annotation ,Sequence Analysis, Protein ,Humans ,Amino Acid Sequence ,Databases, Protein ,Molecular Biology ,Internet ,Sequence database ,Genome, Human ,Computational Biology ,Proteins ,ComputingMethodologies_PATTERNRECOGNITION ,Data mining ,computer ,Software ,Information Systems - Abstract
The applications of InterPro span a range of biologically important areas that includes automatic annotation of protein sequences and genome analysis. In automatic annotation of protein sequences InterPro has been utilised to provide reliable characterisation of sequences, identifying them as candidates for functional annotation. Rules based on the InterPro characterisation are stored and operated through a database called RuleBase. RuleBase is used as the main tool in the sequence database group at the EBI to apply automatic annotation to unknown sequences. The annotated sequences are stored and distributed in the TrEMBL protein sequence database. InterPro also provides a means to carry out statistical and comparative analyses of whole genomes. In the Proteome Analysis Database, InterPro analyses have been combined with other analyses based on CluSTr, the Gene Ontology (GO) and structural information on the proteins.
- Published
- 2002
16. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi
- Author
-
Andrew G. Fraser, Marc Sohrmann, Monica Gotta, Peder Zipperlen, Gino B. Poulin, Nathalie Le Bot, Ravi S. Kamath, Alexander Kanapin, Yan Dong, Sergio Moreno, Richard Durbin, Julie Ahringer, and David P. Welchman
- Subjects
X Chromosome ,Transcription, Genetic ,Sequence analysis ,education ,Genomics ,Genome ,Evolution, Molecular ,RNA interference ,Gene density ,Gene cluster ,Animals ,Humans ,Caenorhabditis elegans ,Gene ,Genes, Helminth ,Genetics ,Multidisciplinary ,biology ,Computational Biology ,Helminth Proteins ,biology.organism_classification ,Protein Structure, Tertiary ,Phenotype ,Multigene Family ,RNA Interference ,RNA, Helminth - Abstract
A principal challenge currently facing biologists is how to connect the complete DNA sequence of an organism to its development and behaviour. Large-scale targeted-deletions have been successful in defining gene functions in the single-celled yeast Saccharomyces cerevisiae, but comparable analyses have yet to be performed in an animal. Here we describe the use of RNA interference to inhibit the function of ∼86% of the 19,427 predicted genes of C. elegans. We identified mutant phenotypes for 1,722 genes, about two-thirds of which were not previously associated with a phenotype. We find that genes of similar functions are clustered in distinct, multi-megabase regions of individual chromosomes; genes in these regions tend to share transcriptional profiles. Our resulting data set and reusable RNAi library of 16,757 bacterial clones will facilitate systematic analyses of the connections among gene sequence, chromosomal location and gene function in C. elegans., R.S.K. was supported by a Howard Hughes Medical Institute Predoctoral Fellowship; A.G.F. by a US Army Breast Cancer Research Fellowship; Y.D., R.D., M.G., D.P.W. and P.Z. by the Wellcome Trust; G.P. by the Canadian Institute of Health Research and the Wellcome Trust; A.K. by the European Molecular Biology Laboratory; N.L.B. by the European Molecular Biology Organization; S.M. by the Centro de Investigacion del Cancer; M.S. by a Swiss National Science Foundation fellowship and J.A. by a Wellcome Trust Senior Research Fellowship.
- Published
- 2002
17. Application of InterPro for the functional classification of the proteins of fish origin in SWISS-PROT and TrEMBL
- Author
-
Alexander Kanapin, Rolf Apweiler, and Margaret Biswas
- Subjects
InterPro ,Genetics ,Internet ,Protein family ,Databases, Factual ,Cytochrome b ,Information Management ,Fishes ,Proteins ,General Medicine ,Structural Classification of Proteins database ,Computational biology ,PROSITE ,Biology ,Fish Proteins ,Genome ,General Biochemistry, Genetics and Molecular Biology ,Sequence Analysis, Protein ,Animals ,UniProt ,General Agricultural and Biological Sciences - Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is an integrated documentation resource for protein families, domains and sites, developed initially as a means of rationalizing the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. It is a useful resource that aids the functional classification of proteins. Almost 90% of the actinopterygii protein sequences from SWISS-PROT and TrEMBL can be classified using InterPro. Over 30% of the actinopterygii protein sequences currently in SWISS-PROT and TrEMBL are of mitochondrial origin, the majority of which belong to the cytochrome b/b6 family. InterPro also gives insights into the domain composition of the classified proteins and has applications in the functional classification of newly determined sequences lacking biochemical characterization, and in comparative genome analysis. A comparison of the actinopterygii protein sequences against the sequences of other eukaryotes confirms the high representation of eukaryotic protein kinase in the organisms studied. The comparisons also show that, based on InterPro families, the trans-species evolution of MHC class I and II molecules in mammals and teleost fish can be recognized.
- Published
- 2001
18. Interactive InterPro-based comparisons of proteins in whole genomes
- Author
-
Evgeni M. Zdobnov, Paul J. Kersey, Tom Oinn, Virginie Mittard, Rolf Apweiler, Evgenia V. Kriventseva, Isabelle Phan, Youla Karavidopoulou, Nicola Mulder, Wolfgang Fleischmann, Margaret Biswas, Florence Servant, and Alexander Kanapin
- Subjects
Statistics and Probability ,Protein coding ,InterPro ,Genome ,Proteome ,Computational Biology ,Computational biology ,Biology ,computer.software_genre ,Biochemistry ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,natural sciences ,Data mining ,Databases, Protein ,Molecular Biology ,computer ,Software - Abstract
Motivation: The SWISS-PROT group at the EBI has developed the Proteome Analysis Database utilizing existing resources and providing comprehensive and integrated comparative analysis of the predicted protein coding sequencesof the complete genomes of bacteria, archaea and eukaryotes. The Proteome Analysis Database is accompanied by a program that has been designed to carry out interactive InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database. Availability: http://www.ebi.ac.uk/proteome/comparisons.html Contact: alex@ebi.ac.uk; proteome@ebi.ac.uk * To whom all correspondence should be addressed.
- Published
- 2002
- Full Text
- View/download PDF
19. Regions Of 3D Similarity in Potential ORF1 Prducts of Mobile Genetics Classes
- Author
-
L. Brodsky, Yu. Ilyin, and Alexander Kanapin
- Subjects
Similarity (network science) ,Bioengineering ,Computational biology ,Biology ,Molecular Biology ,Biochemistry ,Biotechnology - Published
- 1993
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.