50 results on '"Redaschi N"'
Search Results
2. FastAlert-an automatic search system to alert about new entries in biological sequence databanks
- Author
-
Eggenberger, F., Redaschi, N., Doelz, R., Eggenberger, F., Redaschi, N., and Doelz, R.
- Abstract
This paper describes a new tool enabling awareness of new sequence databank entries of interest. The Fast Alert system relieves the researcher from the burden of repeating FASTA searches in order to keep up with the rapidly growing amount of information found in biological sequence databanks. The query sequence can be submitted from any computer connected to the Internet. Upon registration, the databank, including the updates, is scanned at periodic intervals with the sequence provided. The results, so-called FastAlert reports, are delivered via electronic mail. The reports contain the FASTA best-scores list and the similarity statistics for each entry listed
- Published
- 2017
3. Updates in Rhea-a manually curated resource of biochemical reactions
- Author
-
Morgat, A., Axelsen, K.B., Lombardot, T., Alcántara, R., Aimo, L., Zerara, M., Niknejad, A., Belda, E., Hyka-Nouspikel, N., Coudert, E., Redaschi, N., Bougueleret, L., Steinbeck, C., Xenarios, I., and Bridge, A.
- Abstract
Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.
- Published
- 2015
4. The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases
- Author
-
Bultet, LA, Aguilar-Rodriguez, J, Ahrens, CH, Ahrne, EL, Ai, N, Aimo, L, Akalin, A, Aleksiev, T, Alocci, D, Altenhoff, A, Alves, I, Ambrosini, G, Pedone, PA, Angelina, P, Anisimova, M, Appel, R, Argoud-Puy, G, Arnold, K, Arpat, B, Artimo, P, Ascencao, K, Auchincloss, A, Axelsen, K, Gerritsen, VB, Bairoch, A, Barisal, P, Baratin, D, Barbato, A, Barbie, V, Barras, D, Barreiro, M, Barret, S, Bastian, F, Batista Neto, TM, Baudis, M, Beaudoing, E, Beckmann, JS, Bekkar, AK, Cammoun, LBH, Benmohammed, S, Bernard, M, Bertelli, C, Bertoni, M, Bienert, S, Bignucolo, O, Bilbao, A, Bilican, A, Blank, D, Blatter, M-C, Blum, L, Bocquet, J, Boeckmann, B, Bolleman, JT, Bordoli, L, Bosshard, L, Boucher, G, Bougueleret, L, Boutet, E, Bovigny, C, Bratulic, S, Breuza, L, Bridge, AJ, Britan, A, Brito, F, Frazao, JB, Bruggmann, R, Bucher, P, Burdet, F, Burger, L, Cabello, EM, Gomez, RMC, Calderon, S, Cannarozzi, G, Carl, S, Casas, CC, Catherinet, S, Perier, RC, Charpilloz, C, Chaskar, PD, Chen, W, Pepe, AC, Chopard, B, Chu, HY, Civic, N, Claassen, M, Clottu, S, Colombo, M, Cosandier, I, Coudert, E, Crespo, I, Creus, M, Cuche, B, Cuendet, MA, Cusin, I, Daga, N, Daina, A, Dauvillier, J, David, F, Davydov, I, Ferreira, MDSRM, de Beer, T, de Castro, E, de Santana, C, Delafontaine, J, Delorenzi, M, Delucinge-Vivier, C, Demirel, O, Derham, R, Dermitzakis, EM, Dib, L, Diene, S, Dilek, N, Dilmi, J, Domagalski, MJ, Dorier, J, Dornevil, D, Dousse, A, Dreos, R, Duchen, P, Roggli, PD, Duperret, ID, Durinx, C, Duvaud, S, Engler, R, Frkek, S, Lopez, PE, Fstreicher, A, Excoffier, L, Fabbretti, R, Falcone, J-L, Falquet, L, Famiglietti, ML, Ferreira, A-M, Feuermann, M, Filliettaz, M, Hegel, V, Foucal, A, Franceschini, A, Fucile, G, Gaidatzis, D, Garcia, V, Gasteiger, E, Gateau, A, Gatti, L, Gaudet, P, Gaudinat, A, Gehant, S, Gfeller, D, Gharib, WH, Ghraichy, M, Gidoin, C, Gil, M, Gleizes, A, Gobeill, J, Gonnet, G, Gos, A, Gotz, L, Gouy, A, Grbic, D, Groux, R, Gruaz-Gumowski, N, Grun, D, Gschwind, A, Guex, N, Gupta, S, Getaz, M, Haake, D, Haas, J, Hatzimanikatis, V, Heckel, G, Gardiol, DFH, Hinard, V, Hinz, U, Homicsko, K, Horlacher, O, Hosseini, S-R, Hotz, H-R, Hulo, C, Hundsrucker, C, Ibberson, M, Ilmjarv, S, Ioannidis, V, Ioannidis, P, Iseli, C, Ivanek, R, Iwaszkiewicz, J, Jacquet, P, Jacquot, M, Jagannathan, V, Jan, M, Jensen, J, Johansson, MU, Johner, N, Jungo, F, Junier, T, Kahraman, A, Katsantoni, M, Keller, G, Kerhornou, A, Khalid, F, Klingbiel, D, Kimljenovic, A, Kriventseva, E, Kryuchkova, N, Kumar, S, Kutalik, Z, Kuznetsov, D, Kuzyakiv, R, Lane, L, Lara, V, Ledesma, L, Leleu, M, Lemercier, P, Lew, D, Lieberherr, D, Liechti, R, Lisacek, F, Fischer, H, Litsios, G, Liu, J, Lombardot, T, Mace, A, Maffioletti, S, Mahi, M-A, Maiolo, M, Majjigapu, SR, Malmstrom, L, Mangold, V, Marek, D, Mariethoz, J, Marin, R, Martin, O, Martin, X, Martin-Campos, T, Mary, C, Masclaux, F, Masson, P, Meier, C, Messina, A, Lenoir, MM, Meyer, X, Michel, P-A, Michielin, O, Milanese, A, Missiaglia, E, Perez, JM, Caria, VM, Moret, P, Moretti, S, Morgat, A, Mottaz, A, Mottin, L, Mouscaz, Y, Mueller, M, Murri, R, Mylonas, R, Neuenschwander, S, Nikitin, F, Niknejad, A, Nouspikel, N, Nso, LN, Okoniewski, M, Omasits, U, Paccaud, B, Pachkov, M, Paesano, SG, Pagni, M, Palagi, PM, Pasche, E, Payne, JL, Pedruzzi, I, Peischl, S, Peitsch, M, Perlini, S, Pilbout, S, Podvinec, M, Pohlmann, R, Polizzi, D, Potter, D, Poux, S, Pozzato, M, Pradervand, S, Praz, V, Pruess, M, Pujadas, E, Racle, J, Raschi, M, Ratib, O, Rausell, A, de Laval, VR, Redaschi, N, Rempfer, C, Ren, G, Vandati, RAR, Rib, L, Grognuz, OR, Altimiras, ER, Rivoire, C, Robin, T, Robinson-Rechavi, M, Rodrigues, J, Roechert, B, Roelli, P, Romano, V, Rossier, G, Roth, A, Rougemont, J, Roux, J, Royo, H, Ruch, P, Ruinelli, M, Rustom, M, Sates, A, Roehrig, UF, Rueeger, S, Salamin, N, Sankar, M, Sarkar, N, Saxenhofer, M, Schaeffer, M, Schaerli, Y, Schaper, E, Schmid, A, Schmid, E, Schmid, C, Schmid, M, Schmidt, S, Schmocker, D, Schneider, M, Schuepbach, T, Schwede, T, Schuetz, F, Sengstag, T, Serrano, M, Sethi, A, Shahmirzadi, O, Sigrist, C, Silvestro, D, Simao Neto, FA, Simillion, C, Simonovic, M, Skunca, N, Sluzek, K, Soneson, C, Sprouffske, K, Stadler, M, Staehli, S, Stevenson, B, Stockinger, H, Straszewski, J, Stricker, T, Studer, G, Stutz, A, Suffiotti, M, Sundaram, S, Szklarczyk, D, Szovenyi, P, Tegenfeldt, F, Teixeira, D, Tellenbach, S, Smith, AAT, Tognolli, M, Topolsky, I, Thuong, VDT, Tsantoulis, P, Tzika, AC, Agote, AU, van Nimwegen, E, von Mering, C, Varadarajan, A, Veranneman, M, Verbregue, L, Veuthey, A-L, Vishnyakova, D, Vyas, R, Wagner, A, Walther, D, Wan, HW, Wang, M, Waterhouse, R, Waterhouse, A, Wicki, A, Wigger, L, Wirapati, P, Witschi, U, Wyder, S, Wyler, K, Wuethrich, D, Xenarios, I, Yamada, K, Yan, Z, Yasrebi, H, Zahn, M, Zangger, N, Zdobnov, E, Zerzion, D, Zoete, V, Zoller, S, Bultet, LA, Aguilar-Rodriguez, J, Ahrens, CH, Ahrne, EL, Ai, N, Aimo, L, Akalin, A, Aleksiev, T, Alocci, D, Altenhoff, A, Alves, I, Ambrosini, G, Pedone, PA, Angelina, P, Anisimova, M, Appel, R, Argoud-Puy, G, Arnold, K, Arpat, B, Artimo, P, Ascencao, K, Auchincloss, A, Axelsen, K, Gerritsen, VB, Bairoch, A, Barisal, P, Baratin, D, Barbato, A, Barbie, V, Barras, D, Barreiro, M, Barret, S, Bastian, F, Batista Neto, TM, Baudis, M, Beaudoing, E, Beckmann, JS, Bekkar, AK, Cammoun, LBH, Benmohammed, S, Bernard, M, Bertelli, C, Bertoni, M, Bienert, S, Bignucolo, O, Bilbao, A, Bilican, A, Blank, D, Blatter, M-C, Blum, L, Bocquet, J, Boeckmann, B, Bolleman, JT, Bordoli, L, Bosshard, L, Boucher, G, Bougueleret, L, Boutet, E, Bovigny, C, Bratulic, S, Breuza, L, Bridge, AJ, Britan, A, Brito, F, Frazao, JB, Bruggmann, R, Bucher, P, Burdet, F, Burger, L, Cabello, EM, Gomez, RMC, Calderon, S, Cannarozzi, G, Carl, S, Casas, CC, Catherinet, S, Perier, RC, Charpilloz, C, Chaskar, PD, Chen, W, Pepe, AC, Chopard, B, Chu, HY, Civic, N, Claassen, M, Clottu, S, Colombo, M, Cosandier, I, Coudert, E, Crespo, I, Creus, M, Cuche, B, Cuendet, MA, Cusin, I, Daga, N, Daina, A, Dauvillier, J, David, F, Davydov, I, Ferreira, MDSRM, de Beer, T, de Castro, E, de Santana, C, Delafontaine, J, Delorenzi, M, Delucinge-Vivier, C, Demirel, O, Derham, R, Dermitzakis, EM, Dib, L, Diene, S, Dilek, N, Dilmi, J, Domagalski, MJ, Dorier, J, Dornevil, D, Dousse, A, Dreos, R, Duchen, P, Roggli, PD, Duperret, ID, Durinx, C, Duvaud, S, Engler, R, Frkek, S, Lopez, PE, Fstreicher, A, Excoffier, L, Fabbretti, R, Falcone, J-L, Falquet, L, Famiglietti, ML, Ferreira, A-M, Feuermann, M, Filliettaz, M, Hegel, V, Foucal, A, Franceschini, A, Fucile, G, Gaidatzis, D, Garcia, V, Gasteiger, E, Gateau, A, Gatti, L, Gaudet, P, Gaudinat, A, Gehant, S, Gfeller, D, Gharib, WH, Ghraichy, M, Gidoin, C, Gil, M, Gleizes, A, Gobeill, J, Gonnet, G, Gos, A, Gotz, L, Gouy, A, Grbic, D, Groux, R, Gruaz-Gumowski, N, Grun, D, Gschwind, A, Guex, N, Gupta, S, Getaz, M, Haake, D, Haas, J, Hatzimanikatis, V, Heckel, G, Gardiol, DFH, Hinard, V, Hinz, U, Homicsko, K, Horlacher, O, Hosseini, S-R, Hotz, H-R, Hulo, C, Hundsrucker, C, Ibberson, M, Ilmjarv, S, Ioannidis, V, Ioannidis, P, Iseli, C, Ivanek, R, Iwaszkiewicz, J, Jacquet, P, Jacquot, M, Jagannathan, V, Jan, M, Jensen, J, Johansson, MU, Johner, N, Jungo, F, Junier, T, Kahraman, A, Katsantoni, M, Keller, G, Kerhornou, A, Khalid, F, Klingbiel, D, Kimljenovic, A, Kriventseva, E, Kryuchkova, N, Kumar, S, Kutalik, Z, Kuznetsov, D, Kuzyakiv, R, Lane, L, Lara, V, Ledesma, L, Leleu, M, Lemercier, P, Lew, D, Lieberherr, D, Liechti, R, Lisacek, F, Fischer, H, Litsios, G, Liu, J, Lombardot, T, Mace, A, Maffioletti, S, Mahi, M-A, Maiolo, M, Majjigapu, SR, Malmstrom, L, Mangold, V, Marek, D, Mariethoz, J, Marin, R, Martin, O, Martin, X, Martin-Campos, T, Mary, C, Masclaux, F, Masson, P, Meier, C, Messina, A, Lenoir, MM, Meyer, X, Michel, P-A, Michielin, O, Milanese, A, Missiaglia, E, Perez, JM, Caria, VM, Moret, P, Moretti, S, Morgat, A, Mottaz, A, Mottin, L, Mouscaz, Y, Mueller, M, Murri, R, Mylonas, R, Neuenschwander, S, Nikitin, F, Niknejad, A, Nouspikel, N, Nso, LN, Okoniewski, M, Omasits, U, Paccaud, B, Pachkov, M, Paesano, SG, Pagni, M, Palagi, PM, Pasche, E, Payne, JL, Pedruzzi, I, Peischl, S, Peitsch, M, Perlini, S, Pilbout, S, Podvinec, M, Pohlmann, R, Polizzi, D, Potter, D, Poux, S, Pozzato, M, Pradervand, S, Praz, V, Pruess, M, Pujadas, E, Racle, J, Raschi, M, Ratib, O, Rausell, A, de Laval, VR, Redaschi, N, Rempfer, C, Ren, G, Vandati, RAR, Rib, L, Grognuz, OR, Altimiras, ER, Rivoire, C, Robin, T, Robinson-Rechavi, M, Rodrigues, J, Roechert, B, Roelli, P, Romano, V, Rossier, G, Roth, A, Rougemont, J, Roux, J, Royo, H, Ruch, P, Ruinelli, M, Rustom, M, Sates, A, Roehrig, UF, Rueeger, S, Salamin, N, Sankar, M, Sarkar, N, Saxenhofer, M, Schaeffer, M, Schaerli, Y, Schaper, E, Schmid, A, Schmid, E, Schmid, C, Schmid, M, Schmidt, S, Schmocker, D, Schneider, M, Schuepbach, T, Schwede, T, Schuetz, F, Sengstag, T, Serrano, M, Sethi, A, Shahmirzadi, O, Sigrist, C, Silvestro, D, Simao Neto, FA, Simillion, C, Simonovic, M, Skunca, N, Sluzek, K, Soneson, C, Sprouffske, K, Stadler, M, Staehli, S, Stevenson, B, Stockinger, H, Straszewski, J, Stricker, T, Studer, G, Stutz, A, Suffiotti, M, Sundaram, S, Szklarczyk, D, Szovenyi, P, Tegenfeldt, F, Teixeira, D, Tellenbach, S, Smith, AAT, Tognolli, M, Topolsky, I, Thuong, VDT, Tsantoulis, P, Tzika, AC, Agote, AU, van Nimwegen, E, von Mering, C, Varadarajan, A, Veranneman, M, Verbregue, L, Veuthey, A-L, Vishnyakova, D, Vyas, R, Wagner, A, Walther, D, Wan, HW, Wang, M, Waterhouse, R, Waterhouse, A, Wicki, A, Wigger, L, Wirapati, P, Witschi, U, Wyder, S, Wyler, K, Wuethrich, D, Xenarios, I, Yamada, K, Yan, Z, Yasrebi, H, Zahn, M, Zangger, N, Zdobnov, E, Zerzion, D, Zoete, V, and Zoller, S
- Abstract
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.
- Published
- 2016
5. The EMBL nucleotide sequence database
- Author
-
Stoesser, G., Baker, W., Den Broek, A., Camon, E., Garcia-Pastor, M., Kanz, C., Kulikova, T., Lombard, V., Lopez, R., Parkinson, H., Redaschi, N., Peter Sterk, Stoehr, P., Tuli, M. A., Bioinformatique, phylogénie et génomique évolutive (BPGE), Département PEGASE [LBBE] (PEGASE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), and Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Europe ,Internet ,[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT] ,Databases, Factual ,Genetics ,Computational Biology ,Information Storage and Retrieval ,DNA ,Article - Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
- Published
- 2001
6. ExPASy: SIB bioinformatics resource portal
- Author
-
Artimo, P., primary, Jonnalagedda, M., additional, Arnold, K., additional, Baratin, D., additional, Csardi, G., additional, de Castro, E., additional, Duvaud, S., additional, Flegel, V., additional, Fortier, A., additional, Gasteiger, E., additional, Grosdidier, A., additional, Hernandez, C., additional, Ioannidis, V., additional, Kuznetsov, D., additional, Liechti, R., additional, Moretti, S., additional, Mostaguir, K., additional, Redaschi, N., additional, Rossier, G., additional, Xenarios, I., additional, and Stockinger, H., additional
- Published
- 2012
- Full Text
- View/download PDF
7. FastAlert-an automatic search system to alert about new entries in biological sequence databanks
- Author
-
Eggenberger, F., primary, Redaschi, N., additional, and Doelz, R., additional
- Published
- 1996
- Full Text
- View/download PDF
8. FastAlert-an automatic search system to alert about new entries in biological sequence databanks.
- Author
-
Eggenberger, F., Redaschi, N., and Doelz, R.
- Published
- 1996
- Full Text
- View/download PDF
9. An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar.
- Author
-
Famiglietti, M L, Estreicher, A, Breuza, L, Poux, S, Redaschi, N, Xenarios, I, and Bridge, A
- Subjects
AMINO acid sequence ,MEDICAL genetics ,WORKFLOW ,INDIVIDUALIZED medicine ,MEDICAL genomics - Abstract
Personalized genomic medicine depends on integrated analyses that combine genetic and phenotypic data from individual patients with reference knowledge of the functional and clinical significance of sequence variants. Sources of this reference knowledge include the ClinVar repository of human genetic variants, a community resource that accepts submissions from external groups, and UniProtKB/Swiss-Prot, an expert-curated resource of protein sequences and functional annotation. UniProtKB/Swiss-Prot provides knowledge on the functional impact and clinical significance of over 30 000 human protein-coding sequence variants, curated from peer-reviewed literature reports. Here we present a pilot study that lays the groundwork for the integration of curated knowledge of protein sequence variation from UniProtKB/Swiss-Prot with ClinVar. We show that existing interpretations of variant pathogenicity in UniProtKB/Swiss-Prot and ClinVar are highly concordant, with 88% of variants that are common to the two resources having interpretations of clinical significance that agree. Re-curation of a subset of UniProtKB/Swiss-Prot variants according to American College of Medical Genetics and Genomics (ACMG) guidelines using ClinGen tools further increases this level of agreement, mainly due to the reclassification of supposedly pathogenic variants as benign, based on newly available population frequency data. We have now incorporated ACMG guidelines and ClinGen tools into the UniProt Knowledgebase (UniProtKB) curation workflow and routinely submit variant data from UniProtKB/Swiss-Prot to ClinVar. These efforts will increase the usability and utilization of UniProtKB variant data and will facilitate the continuing (re-)evaluation of clinical variant interpretations as data sets and knowledge evolve. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. The EBI RDF Platform: Linked Open Data for the Life Sciences
- Author
-
Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, Gaulton A, Gehant S, Laibe C, Redaschi N, Sm, Wimalaratne, Martin M, Le Novère N, Helen Parkinson, Birney E, and Am, Jenkinson
11. Identifying ELIXIR Core Data Resources
- Author
-
Durinx C, McEntyre J, Appel R, Rolf Apweiler, Barlow M, Blomberg N, Cook C, Gasteiger E, Jh, Kim, Lopez R, Redaschi N, Stockinger H, Teixeira D, and Valencia A
12. FastAlert-an automatic search system to alert about new entries in biological sequence databanks
- Author
-
Eggenberger, F., Redaschi, N., Doelz, R., Eggenberger, F., Redaschi, N., and Doelz, R.
- Abstract
This paper describes a new tool enabling awareness of new sequence databank entries of interest. The Fast Alert system relieves the researcher from the burden of repeating FASTA searches in order to keep up with the rapidly growing amount of information found in biological sequence databanks. The query sequence can be submitted from any computer connected to the Internet. Upon registration, the databank, including the updates, is scanned at periodic intervals with the sequence provided. The results, so-called FastAlert reports, are delivered via electronic mail. The reports contain the FASTA best-scores list and the similarity statistics for each entry listed
13. Infrastructure for the life sciences: design and implementation of the UniProt website
- Author
-
Suzek Baris E, Redaschi Nicole, Phan Isabelle, Duvaud Severine, Bairoch Amos, Jain Eric, Martin Maria J, McGarvey Peter, and Gasteiger Elisabeth
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The UniProt consortium was formed in 2002 by groups from the Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) at Georgetown University, and soon afterwards the website http://www.uniprot.org was set up as a central entry point to UniProt resources. Requests to this address were redirected to one of the three organisations' websites. While these sites shared a set of static pages with general information about UniProt, their pages for searching and viewing data were different. To provide users with a consistent view and to cut the cost of maintaining three separate sites, the consortium decided to develop a common website for UniProt. Following several years of intense development and a year of public beta testing, the http://www.uniprot.org domain was switched to the newly developed site described in this paper in July 2008. Description The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. These tools include full text and field-based text search, similarity search, multiple sequence alignment, batch retrieval and database identifier mapping. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access. http://www.uniprot.org/ is open for both academic and commercial use. The site was built with open source tools and libraries. Feedback is very welcome and should be sent to help@uniprot.org. Conclusion The new UniProt website makes accessing and understanding UniProt easier than ever. The two main lessons learned are that getting the basics right for such a data provider website has huge benefits, but is not trivial and easy to underestimate, and that there is no substitute for using empirical data throughout the development process to decide on what is and what is not working for your users.
- Published
- 2009
- Full Text
- View/download PDF
14. The Universal Protein Resource (UniProt) in 2010
- Author
-
Elisabeth Gasteiger, Amos Bairoch, John Garavelli, Julius Jacobsen, Lionel Breuza, Rachael Huntley, Rolf Apweiler, Christian J. A. SIGRIST, Rebecca Foulger, Jerven Bolleman, Raja Mazumder, Ivo Pedruzzi, Florence Jungo, Anaïs Mottaz, Michael Tognolli, Emmanuel Boutet, Claire O'Donovan, Edward Turner, Sandra Orchard, Patrick Masson, Peter McGarvey, Nicole Redaschi, Sébastien Géhant, Michele Magrane, Anne Estreicher, Alan Bridge, Michel Schneider, Daniel Barrell, Benoit Bely, Anne Morgat, Sylvain Poux, Petra Langendijk-Genevaux, Maria-Jesus Martin, Catherine Rivoire, Elisabeth Coudert, Rasko Leinonen, Cecilia Arighi, UniProt Consortium, Apweiler, R., Martin, MJ., O'Donovan, C., Magrane, M., Alam-Faruque, Y., Antunes, R., Barrell, D., Bely, B., Bingley, M., Binns, D., Bower, L., Browne, P., Chan, WM., Dimmer, E., Eberhardt, R., Fedotov, A., Foulger, R., Garavelli, J., Huntley, R., Jacobsen, J., Kleen, M., Laiho, K., Leinonen, R., Legge, D., Lin, Q., Liu, W., Luo, J., Orchard, S., Patient, S., Poggioli, D., Pruess, M., Corbett, M., di Martino, G., Donnelly, M., van Rensburg, P., Bairoch, A., Bougueleret, L., Xenarios, I., Altairac, S., Auchincloss, A., Argoud-Puy, G., Axelsen, K., Baratin, D., Blatter, MC., Boeckmann, B., Bolleman, J., Bollondi, L., Boutet, E., Quintaje, SB., Breuza, L., Bridge, A., deCastro, E., Ciapina, L., Coral, D., Coudert, E., Cusin, I., Delbard, G., Doche, M., Dornevil, D., Roggli, PD., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gehant, S., Farriol-Mathis, N., Ferro, S., Gasteiger, E., Gateau, A., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Hulo, N., James, J., Jimenez, S., Jungo, F., Kappler, T., Keller, G., Lachaize, C., Lane-Guermonprez, L., Langendijk-Genevaux, P., Lara, V., Lemercier, P., Lieberherr, D., de Oliveira Lima, T., Mangold, V., Martin, X., Masson, P., Moinat, M., Morgat, A., Mottaz, A., Paesano, S., Pedruzzi, I., Pilbout, S., Pillet, V., Poux, S., Pozzato, M., Redaschi, N., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, AL., Yip, L., Zuletta, L., Wu, C., Arighi, C., Arminski, L., Barker, W., Chen, C., Chen, Y., Hu, ZZ., Huang, H., Mazumder, R., McGarvey, P., Natale, DA., Nchoutmboube, J., Petrova, N., Subramanian, N., Suzek, BE., Ugochukwu, U., Vasudevan, S., Vinayaka, CR., Yeh, LS., and Zhang, J.
- Subjects
Proteomics ,Internet ,0303 health sciences ,Proteome ,030302 biochemistry & molecular biology ,Computational Biology ,Information Storage and Retrieval ,Genome, Viral ,Articles ,Europe ,03 medical and health sciences ,Algorithms ,Animals ,Computational Biology/methods ,Computational Biology/trends ,Databases, Nucleic Acid ,Databases, Protein ,Genome, Fungal ,Humans ,Information Storage and Retrieval/methods ,Protein Isoforms ,Software ,Genetics ,030304 developmental biology - Abstract
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
- Published
- 2009
15. Ongoing and future developments at the Universal Protein Resource
- Author
-
Vicente Lara, Arnaud Gos, Nikolas Pontikos, L. Bollondi, L Famiglietti, Barker Wc, Chuming Chen, Michael Tognolli, Patrick Masson, Salvo Paesano, Tony Sawford, Sandrine Pilbout, Jerven Bolleman, Jian Zhang, S Staehli, Elisabeth Gasteiger, David Binns, Ursula Hinz, Nadine Gruaz-Gumowski, Chantal Hulo, Maria Jesus Martin, M. Corbett, Veuthey Al, Paul Browne, Guillaume Keller, S. Jimenez, Roberts Nv, Brigitte Boeckmann, Natale Da, Suzek Be, Edward Turner, T. Kappler, Steven Rosanoff, Leslie Arminski, Florence Jungo, M Donnelly, P. Dubey, Ruth Y. Eberhardt, Lydie Bougueleret, Vivienne Baillie Gerritsen, Maria Victoria Schneider, Lionel Breuza, Isabelle Cusin, Delphine Baratin, E. Dimmer, Rachael P. Huntley, Julius O.B. Jacobsen, Arighi Cn, Quintaje Sb, Alan Bridge, Nicole Redaschi, Shyamala Sundaram, Peter B. McGarvey, Rebecca E. Foulger, Duncan Legge, Andre Stutz, Monica Pozzato, Raja Mazumder, Anne Morgat, M Doche, K Sonesson, Anne Estreicher, Blatter Mc, Manuela Pruess, Serenella Ferro, Séverine Duvaud, F. Fazzini, Rolf Apweiler, Natarajan Tg, E. Stanley, M Feuermann, Claire O'Donovan, J. James, Ivo Pedruzzi, Castro Lg, Wu Ch, M Bingley, W Liu, He Huang, Bernd Roechert, Laure Verbregue, Elisabeth Coudert, Ioannis Xenarios, M. Moinat, Sandra Orchard, P Lemercier, Damien Lieberherr, Ricardo Antunes, Ghislaine Argoud-Puy, Sebastien Gehant, Qiang Wang, Klemens Pichler, S. Patient, Nicolas Hulo, Diego Poggioli, J. Nchoutmboube, Emmanuel Boutet, H. Sehra, Chan Wm, Yasmin Alam-Faruque, M. Kleen, Van Rensburg P, Kati Laiho, Kristian B. Axelsen, John S. Garavelli, Christian J. A. Sigrist, Amos Marc Bairoch, Yongxing Chen, A. Fedotov, Yeh Ls, Sylvain Poux, Lynette Bower, Quan Lin, Daniel Barrell, Benoit Bely, U. Ugochukwu, Xavier D. Martin, Catherine Rivoire, E. Decastro, Jie Luo, Alain Gateau, Michele Magrane, Dolnide Dornevil, Vinayaka Cr, Andrea H. Auchincloss, Yuqi Wang, An algorithmic view on genomes, cells, and environments (BAMBOO), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne = University of Lausanne (UNIL), Institut de Microélectronique, Electromagnétisme et Photonique - Laboratoire d'Hyperfréquences et Caractérisation (IMEP-LAHC), Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Université Savoie Mont Blanc (USMB [Université de Savoie] [Université de Chambéry])-Centre National de la Recherche Scientifique (CNRS), Institut de minéralogie et de physique des milieux condensés (IMPMC), Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Institut de Physique du Globe de Paris (IPG Paris)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria), Université de Lausanne (UNIL), Université Pierre et Marie Curie - Paris 6 (UPMC)-IPG PARIS-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS), UniProt Consortium, Apweiler, R., Martin, MJ., O'Donovan, C., Magrane, M., Alam-Faruque, Y., Antunes, R., Barrell, D., Bely, B., Bingley, M., Binns, D., Bower, L., Browne, P., Chan, WM., Dimmer, E., Eberhardt, R., Fazzini, F., Fedotov, A., Foulger, R., Garavelli, J., Castro, LG., Huntley, R., Jacobsen, J., Kleen, M., Laiho, K., Legge, D., Lin, Q., Liu, W., Luo, J., Orchard, S., Patient, S., Pichler, K., Poggioli, D., Pontikos, N., Pruess, M., Rosanoff, S., Sawford, T., Sehra, H., Turner, E., Corbett, M., Donnelly, M., van Rensburg, P., Xenarios, I., Bougueleret, L., Auchincloss, A., Argoud-Puy, G., Axelsen, K., Bairoch, A., Baratin, D., Blatter, MC., Boeckmann, B., Bolleman, J., Bollondi, L., Boutet, E., Quintaje, SB., Breuza, L., Bridge, A., deCastro, E., Coudert, E., Cusin, I., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gehant, S., Ferro, S., Gasteiger, E., Gateau, A., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Hulo, N., James, J., Jimenez, S., Jungo, F., Kappler, T., Keller, G., Lara, V., Lemercier, P., Lieberherr, D., Martin, X., Masson, P., Moinat, M., Morgat, A., Paesano, S., Pedruzzi, I., Pilbout, S., Poux, S., Pozzato, M., Redaschi, N., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, AL., Wu, CH., Arighi, CN., Arminski, L., Barker, WC., Chen, C., Chen, Y., Dubey, P., Huang, H., Mazumder, R., McGarvey, P., Natale, DA., Natarajan, TG., Nchoutmboube, J., Roberts, NV., Suzek, BE., Ugochukwu, U., Vinayaka, CR., Wang, Q., Wang, Y., Yeh, LS., and Zhang, J.
- Subjects
Proteomics ,0303 health sciences ,Sequence database ,030302 biochemistry & molecular biology ,Proteins ,Articles ,Computational biology ,Biology ,Bioinformatics ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Systems Integration ,Universal Protein Resource ,03 medical and health sciences ,Information resource ,Protein sequencing ,Sequence Analysis, Protein ,Metagenomics ,UniProt Knowledgebase ,Genetics ,Databases, Protein/trends ,Proteins/chemistry ,Proteins/genetics ,UniProt ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,Databases, Protein ,030304 developmental biology - Abstract
International audience; The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
- Published
- 2011
16. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase
- Author
-
Elisabeth Gasteiger, Amos Bairoch, John Garavelli, Julius Jacobsen, Lionel Breuza, Rachael Huntley, Rolf Apweiler, Christian J. A. SIGRIST, Rebecca Foulger, Raja Mazumder, Ivo Pedruzzi, Florence Jungo, Anaïs Mottaz, Michael Tognolli, Emmanuel Boutet, Claire O'Donovan, Edward Turner, Sandra Orchard, Patrick Masson, Peter McGarvey, Nicole Redaschi, Sébastien Géhant, Michele Magrane, Anne Estreicher, Alan Bridge, Daniel Barrell, Benoit Bely, Anne Morgat, Sylvain Poux, Petra Langendijk-Genevaux, Maria-Jesus Martin, Catherine Rivoire, Elisabeth Coudert, Rasko Leinonen, Cecilia Arighi, UniProt Consortium, Apweiler, R., Martin, MJ., O'Donovan, C., Magrane, M., Alam-Faruque, Y., Antunes, R., Barrell, D., Bely, B., Bingley, M., Binns, D., Bower, L., Browne, P., Chan, WM., Dimmer, E., Eberhardt, R., Fedotov, A., Foulger, R., Garavelli, J., Huntley, R., Jacobsen, J., Kleen, M., Laiho, K., Leinonen, R., Legge, D., Lin, Q., Liu, W., Luo, J., Orchard, S., Patient, S., Poggioli, D., Pruess, M., Corbett, M., di Martino, G., Donnelly, M., van Rensburg, P., Bairoch, A., Bougueleret, L., Xenarios, I., Altairac, S., Auchincloss, A., Argoud-Puy, G., Axelsen, K., Baratin, D., Blatter, MC., Boeckmann, B., Bolleman, J., Bollondi, L., Boutet, E., Quintaje, SB., Breuza, L., Bridge, A., de Castro, E., Ciapina, L., Coral, D., Coudert, E., Cusin, I., David, F., Delbard, G., Doche, M., Dornevil, D., Roggli, PD., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gehant, S., Farriol-Mathis, N., Ferro, S., Gasteiger, E., Gateau, A., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Hulo, N., James, J., Jimenez, S., Jungo, F., Kappler, T., Keller, G., Lachaize, C., Lane-Guermonprez, L., Langendijk-Genevaux, P., Lara, V., Lemercier, P., Lieberherr, D., Lima Tde, O., Mangold, V., Martin, X., Masson, P., Moinat, M., Morgat, A., Mottaz, A., Paesano, S., Pedruzzi, I., Pilbout, S., Pillet, V., Poux, S., Pozzato, M., Redaschi, N., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stanley, E., Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, AL., Yip, L., Zuletta, L., Wu, C., Arighi, C., Arminski, L., Barker, W., Chen, C., Chen, Y., Hu, ZZ., Huang, H., Mazumder, R., McGarvey, P., Natale, DA., Nchoutmboube, J., Petrova, N., Subramanian, N., Suzek, BE., Ugochukwu, U., Vasudevan, S., Vinayaka, CR., Yeh, LS., and Zhang, J.
- Subjects
Proteomics ,Binding Sites ,Catalytic Domain ,Databases, Protein ,Knowledge Bases ,Protein Conformation ,Proteins/chemistry ,Proteins/genetics ,Sequence Analysis, Protein ,Sequence analysis ,Computer science ,Annotation ,Structural genomics ,Context (language use) ,Review ,Computational biology ,Bioinformatics ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Protein 3D-structure ,Swiss-Prot ,Molecular Biology ,UniProtKB ,030304 developmental biology ,Pharmacology ,0303 health sciences ,030302 biochemistry & molecular biology ,Proteins ,Experimental data ,Data flood ,Cell Biology ,Knowledgebase ,Data access ,Molecular Medicine ,UniProt - Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
- Published
- 2010
17. EnzChemRED, a rich enzyme chemistry relation extraction dataset.
- Author
-
Lai PT, Coudert E, Aimo L, Axelsen K, Breuza L, de Castro E, Feuermann M, Morgat A, Pourcel L, Pedruzzi I, Poux S, Redaschi N, Rivoire C, Sveshnikova A, Wei CH, Leaman R, Luo L, Lu Z, and Bridge A
- Subjects
- PubMed, Databases, Protein, Knowledge Bases, Enzymes chemistry, Natural Language Processing
- Abstract
Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI. We show that fine-tuning language models with EnzChemRED significantly boosts their ability to identify proteins and chemicals in text (86.30% F
1 score) and to extract the chemical conversions (86.66% F1 score) and the enzymes that catalyze those conversions (83.79% F1 score). We apply our methods to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea., (© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.)- Published
- 2024
- Full Text
- View/download PDF
18. Annotation of biologically relevant ligands in UniProtKB using ChEBI.
- Author
-
Coudert E, Gehant S, de Castro E, Pozzato M, Baratin D, Neto T, Sigrist CJA, Redaschi N, and Bridge A
- Subjects
- Databases, Protein, Ligands, Amino Acid Sequence, Binding Sites, Molecular Sequence Annotation, Knowledge Bases
- Abstract
Motivation: To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities of Biological Interest), to better support efforts to study and predict functionally relevant interactions between protein sequences and structures and small molecule ligands., Results: We structured the data model for cognate ligand binding site annotations in UniProtKB and performed a complete reannotation of all cognate ligand binding sites using stable unique identifiers from ChEBI, which we now use as the reference vocabulary for all such annotations. We developed improved search and query facilities for cognate ligands in the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that ChEBI provides., Availability and Implementation: Binding site annotations for cognate ligands described using ChEBI are available for UniProtKB protein sequence records in several formats (text, XML and RDF) and are freely available to query and download through the UniProt website (www.uniprot.org), REST API (www.uniprot.org/help/api), SPARQL endpoint (sparql.uniprot.org/) and FTP site (https://ftp.uniprot.org/pub/databases/uniprot/)., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2023
- Full Text
- View/download PDF
19. SwissBioPics-an interactive library of cell images for the visualization of subcellular location data.
- Author
-
Le Mercier P, Bolleman J, de Castro E, Gasteiger E, Bansal P, Auchincloss AH, Boutet E, Breuza L, Casals-Casas C, Estreicher A, Feuermann M, Lieberherr D, Rivoire C, Pedruzzi I, Redaschi N, and Bridge A
- Subjects
- Animals, Proteins, Vocabulary, Controlled
- Abstract
SwissBioPics (www.swissbiopics.org) is a freely available resource of interactive, high-resolution cell images designed for the visualization of subcellular location data. SwissBioPics provides images describing cell types from all kingdoms of life-from the specialized muscle, neuronal and epithelial cells of animals, to the rods, cocci, clubs and spirals of prokaryotes. All cell images in SwissBioPics are drawn in Scalable Vector Graphics (SVG), with each subcellular location tagged with a unique identifier from the controlled vocabulary of subcellular locations and organelles of UniProt (https://www.uniprot.org/locations/). Users can search and explore SwissBioPics cell images through our website, which provides a platform for users to learn more about how cells are organized. A web component allows developers to embed SwissBioPics images in their own websites, using the associated JavaScript and a styling template, and to highlight subcellular locations and organelles by simply providing the web component with the appropriate identifier(s) from the UniProt-controlled vocabulary or the 'Cellular Component' branch of the Gene Ontology (www.geneontology.org), as well as an organism identifier from the National Center for Biotechnology Information taxonomy (https://www.ncbi.nlm.nih.gov/taxonomy). The UniProt website now uses SwissBioPics to visualize the subcellular locations and organelles where proteins function. SwissBioPics is freely available for anyone to use under a Creative Commons Attribution 4.0 International (CC BY 4.0) license., Database Url: www.swissbiopics.org., (© The Author(s) 2022. Published by Oxford University Press.)
- Published
- 2022
- Full Text
- View/download PDF
20. Rhea, the reaction knowledgebase in 2022.
- Author
-
Bansal P, Morgat A, Axelsen KB, Muthukrishnan V, Coudert E, Aimo L, Hyka-Nouspikel N, Gasteiger E, Kerhornou A, Neto TB, Pozzato M, Blatter MC, Ignatchenko A, Redaschi N, and Bridge A
- Subjects
- Animals, Humans, Internet, Knowledge Bases, Chemical Phenomena, Databases, Factual, Software
- Abstract
Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function., (© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2022
- Full Text
- View/download PDF
21. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.
- Author
-
Hufsky F, Lamkiewicz K, Almeida A, Aouacheria A, Arighi C, Bateman A, Baumbach J, Beerenwinkel N, Brandt C, Cacciabue M, Chuguransky S, Drechsel O, Finn RD, Fritz A, Fuchs S, Hattab G, Hauschild AC, Heider D, Hoffmann M, Hölzer M, Hoops S, Kaderali L, Kalvari I, von Kleist M, Kmiecinski R, Kühnert D, Lasso G, Libin P, List M, Löchel HF, Martin MJ, Martin R, Matschinske J, McHardy AC, Mendes P, Mistry J, Navratil V, Nawrocki EP, O'Toole ÁN, Ontiveros-Palacios N, Petrov AI, Rangel-Pineros G, Redaschi N, Reimering S, Reinert K, Reyes A, Richardson L, Robertson DL, Sadegh S, Singer JB, Theys K, Upton C, Welzel M, Williams L, and Marz M
- Subjects
- Biomedical Research, COVID-19 epidemiology, COVID-19 virology, Genome, Viral, Humans, Pandemics, SARS-CoV-2 genetics, COVID-19 prevention & control, Computational Biology, SARS-CoV-2 isolation & purification
- Abstract
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2021
- Full Text
- View/download PDF
22. Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB.
- Author
-
Feuermann M, Boutet E, Morgat A, Axelsen KB, Bansal P, Bolleman J, de Castro E, Coudert E, Gasteiger E, Géhant S, Lieberherr D, Lombardot T, Neto TB, Pedruzzi I, Poux S, Pozzato M, Redaschi N, Bridge A, and On Behalf Of The UniProt Consortium
- Abstract
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.
- Published
- 2021
- Full Text
- View/download PDF
23. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences.
- Author
-
Drysdale R, Cook CE, Petryszak R, Baillie-Gerritsen V, Barlow M, Gasteiger E, Gruhl F, Haas J, Lanfear J, Lopez R, Redaschi N, Stockinger H, Teixeira D, Venkatesan A, Blomberg N, Durinx C, and McEntyre J
- Subjects
- Biological Science Disciplines, Computational Biology
- Abstract
Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
24. Enzyme annotation in UniProtKB using Rhea.
- Author
-
Morgat A, Lombardot T, Coudert E, Axelsen K, Neto TB, Gehant S, Bansal P, Bolleman J, Gasteiger E, de Castro E, Baratin D, Pozzato M, Xenarios I, Poux S, Redaschi N, and Bridge A
- Subjects
- Animals, Databases, Protein, Knowledge Bases, Rheiformes
- Abstract
Motivation: To provide high quality computationally tractable enzyme annotation in UniProtKB using Rhea, a comprehensive expert-curated knowledgebase of biochemical reactions which describes reaction participants using the ChEBI (Chemical Entities of Biological Interest) ontology., Results: We replaced existing textual descriptions of biochemical reactions in UniProtKB with their equivalents from Rhea, which is now the standard for annotation of enzymatic reactions in UniProtKB. We developed improved search and query facilities for the UniProt website, REST API and SPARQL endpoint that leverage the chemical structure data, nomenclature and classification that Rhea and ChEBI provide., Availability and Implementation: UniProtKB at https://www.uniprot.org; UniProt REST API at https://www.uniprot.org/help/api; UniProt SPARQL endpoint at https://sparql.uniprot.org/; Rhea at https://www.rhea-db.org., (© The Author(s) 2019. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
25. HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.
- Author
-
Bolleman J, de Castro E, Baratin D, Gehant S, Cuche BA, Auchincloss AH, Coudert E, Hulo C, Masson P, Pedruzzi I, Rivoire C, Xenarios I, Redaschi N, and Bridge A
- Subjects
- Animals, Genomics standards, Humans, Molecular Sequence Annotation standards, Sequence Analysis, DNA standards, Sequence Analysis, Protein standards, Genomics methods, Molecular Sequence Annotation methods, Sequence Analysis, DNA methods, Sequence Analysis, Protein methods, Software standards
- Abstract
Background: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation., Results: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline., Conclusions: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
26. FAIR adoption, assessment and challenges at UniProt.
- Author
-
Garcia L, Bolleman J, Gehant S, Redaschi N, and Martin M
- Published
- 2019
- Full Text
- View/download PDF
27. Updates in Rhea: SPARQLing biochemical reaction data.
- Author
-
Lombardot T, Morgat A, Axelsen KB, Aimo L, Hyka-Nouspikel N, Niknejad A, Ignatchenko A, Xenarios I, Coudert E, Redaschi N, and Bridge A
- Subjects
- Humans, Knowledge Bases, Systems Biology methods, Databases, Chemical, Databases, Protein, Metabolomics methods, Software standards
- Abstract
Rhea (http://www.rhea-db.org) is a comprehensive and non-redundant resource of over 11 000 expert-curated biochemical reactions that uses chemical entities from the ChEBI ontology to represent reaction participants. Originally designed as an annotation vocabulary for the UniProt Knowledgebase (UniProtKB), Rhea also provides reaction data for a range of other core knowledgebases and data repositories including ChEBI and MetaboLights. Here we describe recent developments in Rhea, focusing on a new resource description framework representation of Rhea reaction data and an SPARQL endpoint (https://sparql.rhea-db.org/sparql) that provides access to it. We demonstrate how federated queries that combine the Rhea SPARQL endpoint and other SPARQL endpoints such as that of UniProt can provide improved metabolite annotation and support integrative analyses that link the metabolome through the proteome to the transcriptome and genome. These developments will significantly boost the utility of Rhea as a means to link chemistry and biology for a more holistic understanding of biological systems and their function in health and disease.
- Published
- 2019
- Full Text
- View/download PDF
28. InterPro in 2019: improving coverage, classification and access to protein sequence annotations.
- Author
-
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong SY, and Finn RD
- Subjects
- Animals, Databases, Genetic, Gene Ontology, Humans, Internet, Multigene Family, Protein Domains genetics, Sequence Homology, Amino Acid, Software, User-Computer Interface, Databases, Protein, Molecular Sequence Annotation
- Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
- Published
- 2019
- Full Text
- View/download PDF
29. An enhanced workflow for variant interpretation in UniProtKB/Swiss-Prot improves consistency and reuse in ClinVar.
- Author
-
Famiglietti ML, Estreicher A, Breuza L, Poux S, Redaschi N, Xenarios I, and Bridge A
- Subjects
- Copper-Transporting ATPases genetics, Nerve Tissue Proteins genetics, Zinc Finger Protein Gli3 genetics, Databases, Protein, Genetic Variation, Knowledge Bases, Workflow
- Abstract
Personalized genomic medicine depends on integrated analyses that combine genetic and phenotypic data from individual patients with reference knowledge of the functional and clinical significance of sequence variants. Sources of this reference knowledge include the ClinVar repository of human genetic variants, a community resource that accepts submissions from external groups, and UniProtKB/Swiss-Prot, an expert-curated resource of protein sequences and functional annotation. UniProtKB/Swiss-Prot provides knowledge on the functional impact and clinical significance of over 30 000 human protein-coding sequence variants, curated from peer-reviewed literature reports. Here we present a pilot study that lays the groundwork for the integration of curated knowledge of protein sequence variation from UniProtKB/Swiss-Prot with ClinVar. We show that existing interpretations of variant pathogenicity in UniProtKB/Swiss-Prot and ClinVar are highly concordant, with 88% of variants that are common to the two resources having interpretations of clinical significance that agree. Re-curation of a subset of UniProtKB/Swiss-Prot variants according to American College of Medical Genetics and Genomics (ACMG) guidelines using ClinGen tools further increases this level of agreement, mainly due to the reclassification of supposedly pathogenic variants as benign, based on newly available population frequency data. We have now incorporated ACMG guidelines and ClinGen tools into the UniProt Knowledgebase (UniProtKB) curation workflow and routinely submit variant data from UniProtKB/Swiss-Prot to ClinVar. These efforts will increase the usability and utilization of UniProtKB variant data and will facilitate the continuing (re-)evaluation of clinical variant interpretations as data sets and knowledge evolve., (© The Author(s) 2019. Published by Oxford University Press.)
- Published
- 2019
- Full Text
- View/download PDF
30. Updates in Rhea - an expert curated resource of biochemical reactions.
- Author
-
Morgat A, Lombardot T, Axelsen KB, Aimo L, Niknejad A, Hyka-Nouspikel N, Coudert E, Pozzato M, Pagni M, Moretti S, Rosanoff S, Onwubiko J, Bougueleret L, Xenarios I, Redaschi N, and Bridge A
- Published
- 2017
- Full Text
- View/download PDF
31. InterPro in 2017-beyond protein family and domain annotations.
- Author
-
Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, Chang HY, Dosztányi Z, El-Gebali S, Fraser M, Gough J, Haft D, Holliday GL, Huang H, Huang X, Letunic I, Lopez R, Lu S, Marchler-Bauer A, Mi H, Mistry J, Natale DA, Necci M, Nuka G, Orengo CA, Park Y, Pesseat S, Piovesan D, Potter SC, Rawlings ND, Redaschi N, Richardson L, Rivoire C, Sangrador-Vegas A, Sigrist C, Sillitoe I, Smithers B, Squizzato S, Sutton G, Thanki N, Thomas PD, Tosatto SC, Wu CH, Xenarios I, Yeh LS, Young SY, and Mitchell AL
- Subjects
- Humans, Molecular Sequence Annotation, Phylogeny, Computational Biology methods, Databases, Protein, Protein Interaction Domains and Motifs, Software
- Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences., (© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2017
- Full Text
- View/download PDF
32. Minimizing proteome redundancy in the UniProt Knowledgebase.
- Author
-
Bursteinas B, Britto R, Bely B, Auchincloss A, Rivoire C, Redaschi N, O'Donovan C, and Martin MJ
- Subjects
- Bacteria metabolism, Bacterial Proteins metabolism, Proteome metabolism, Bacteria genetics, Bacterial Proteins genetics, Databases, Protein, Molecular Sequence Annotation methods, Proteome genetics, Sequence Analysis, Protein methods
- Abstract
Advances in high-throughput sequencing have led to an unprecedented growth in genome sequences being submitted to biological databases. In particular, the sequencing of large numbers of nearly identical bacterial genomes during infection outbreaks and for other large-scale studies has resulted in a high level of redundancy in nucleotide databases and consequently in the UniProt Knowledgebase (UniProtKB). Redundancy negatively impacts on database searches by causing slower searches, an increase in statistical bias and cumbersome result analysis. The redundancy combined with the large data volume increases the computational costs for most reuses of UniProtKB data. All of this poses challenges for effective discovery in this wealth of data. With the continuing development of sequencing technologies, it is clear that finding ways to minimize redundancy is crucial to maintaining UniProt's essential contribution to data interpretation by our users. We have developed a methodology to identify and remove highly redundant proteomes from UniProtKB. The procedure identifies redundant proteomes by performing pairwise alignments of sets of sequences for pairs of proteomes and subsequently, applies graph theory to find dominating sets that provide a set of non-redundant proteomes with a minimal loss of information. This method was implemented for bacteria in mid-2015, resulting in a removal of 50 million proteins in UniProtKB. With every new release, this procedure is used to filter new incoming proteomes, resulting in a more scalable and scientifically valuable growth of UniProtKB.Database URL: http://www.uniprot.org/proteomes/., (© The Author(s) 2016. Published by Oxford University Press.)
- Published
- 2016
- Full Text
- View/download PDF
33. Identifying ELIXIR Core Data Resources.
- Author
-
Durinx C, McEntyre J, Appel R, Apweiler R, Barlow M, Blomberg N, Cook C, Gasteiger E, Kim JH, Lopez R, Redaschi N, Stockinger H, Teixeira D, and Valencia A
- Abstract
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are available long-term and that the life-cycles of these resources are managed such that they support the scientific needs of the life-sciences, including biological research. ELIXIR Core Data Resources are defined as a set of European data resources that are of fundamental importance to the wider life-science community and the long-term preservation of biological data. They are complete collections of generic value to life-science, are considered an authority in their field with respect to one or more characteristics, and show high levels of scientific quality and service. Thus, ELIXIR Core Data Resources are of wide applicability and usage. This paper describes the structures, governance and processes that support the identification and evaluation of ELIXIR Core Data Resources. It identifies key indicators which reflect the essence of the definition of an ELIXIR Core Data Resource and support the promotion of excellence in resource development and operation. It describes the specific indicators in more detail and explains their application within ELIXIR's sustainability strategy and science policy actions, and in capacity building, life-cycle management and technical actions. The identification process is currently being implemented and tested for the first time. The findings and outcome will be evaluated by the ELIXIR Scientific Advisory Board in March 2017. Establishing the portfolio of ELIXIR Core Data Resources and ELIXIR Services is a key priority for ELIXIR and publicly marks the transition towards a cohesive infrastructure., Competing Interests: Competing interests: No competing interests were disclosed.
- Published
- 2016
- Full Text
- View/download PDF
34. The UniProtKB guide to the human proteome.
- Author
-
Breuza L, Poux S, Estreicher A, Famiglietti ML, Magrane M, Tognolli M, Bridge A, Baratin D, and Redaschi N
- Subjects
- Automation, Genome, Humans, Knowledge Bases, Phenotype, Protein Processing, Post-Translational, Proteins chemistry, RNA Editing, Software, Databases, Protein, Proteome genetics, Proteomics methods
- Abstract
Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org., (© The Author(s) 2016. Published by Oxford University Press.)
- Published
- 2016
- Full Text
- View/download PDF
35. SPARQL-enabled identifier conversion with Identifiers.org.
- Author
-
Wimalaratne SM, Bolleman J, Juty N, Katayama T, Dumontier M, Redaschi N, Le Novère N, Hermjakob H, and Laibe C
- Subjects
- Biological Science Disciplines, Internet, Semantics, Systems Integration, Databases, Factual
- Abstract
Motivation: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data., Results: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data., Availability and Implementation: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql., (© The Author 2015. Published by Oxford University Press.)
- Published
- 2015
- Full Text
- View/download PDF
36. The InterPro protein families database: the classification resource after 15 years.
- Author
-
Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJ, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, and Finn RD
- Subjects
- Bacteria metabolism, Gene Ontology, Protein Structure, Tertiary, Proteins genetics, Sequence Analysis, Protein, Software, Databases, Protein, Proteins classification
- Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012., (© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2015
- Full Text
- View/download PDF
37. HAMAP in 2015: updates to the protein family classification and annotation system.
- Author
-
Pedruzzi I, Rivoire C, Auchincloss AH, Coudert E, Keller G, de Castro E, Baratin D, Cuche BA, Bougueleret L, Poux S, Redaschi N, Xenarios I, and Bridge A
- Subjects
- Humans, Internet, Proteins classification, Databases, Protein, Molecular Sequence Annotation, Sequence Homology, Amino Acid
- Abstract
HAMAP (High-quality Automated and Manual Annotation of Proteins--available at http://hamap.expasy.org/) is a system for the automatic classification and annotation of protein sequences. HAMAP provides annotation of the same quality and detail as UniProtKB/Swiss-Prot, using manually curated profiles for protein sequence family classification and expert curated rules for functional annotation of family members. HAMAP data and tools are made available through our website and as part of the UniRule pipeline of UniProt, providing annotation for millions of unreviewed sequences of UniProtKB/TrEMBL. Here we report on the growth of HAMAP and updates to the HAMAP system since our last report in the NAR Database Issue of 2013. We continue to augment HAMAP with new family profiles and annotation rules as new protein families are characterized and annotated in UniProtKB/Swiss-Prot; the latest version of HAMAP (as of 3 September 2014) contains 1983 family classification profiles and 1998 annotation rules (up from 1780 and 1720). We demonstrate how the complex logic of HAMAP rules allows for precise annotation of individual functional variants within large homologous protein families. We also describe improvements to our web-based tool HAMAP-Scan which simplify the classification and annotation of sequences, and the incorporation of an improved sequence-profile search algorithm., (© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2015
- Full Text
- View/download PDF
38. Updates in Rhea--a manually curated resource of biochemical reactions.
- Author
-
Morgat A, Axelsen KB, Lombardot T, Alcántara R, Aimo L, Zerara M, Niknejad A, Belda E, Hyka-Nouspikel N, Coudert E, Redaschi N, Bougueleret L, Steinbeck C, Xenarios I, and Bridge A
- Subjects
- Biochemical Phenomena, Biopolymers metabolism, Genomics, Internet, Databases, Chemical, Enzymes metabolism, Metabolic Networks and Pathways genetics
- Abstract
Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models., (© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2015
- Full Text
- View/download PDF
39. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation.
- Author
-
Famiglietti ML, Estreicher A, Gos A, Bolleman J, Géhant S, Breuza L, Bridge A, Poux S, Redaschi N, Bougueleret L, and Xenarios I
- Subjects
- Amino Acid Sequence, Genetic Variation, Genome, Human, High-Throughput Nucleotide Sequencing, Humans, Internet, Molecular Sequence Annotation, Molecular Sequence Data, Terminology as Topic, Databases, Protein statistics & numerical data, Genetic Association Studies, Genetics, Medical, Knowledge Bases, Proteome, Software
- Abstract
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype., (© 2014 The Authors. *Human Mutation published by Wiley Periodicals, Inc.)
- Published
- 2014
- Full Text
- View/download PDF
40. The EBI RDF platform: linked open data for the life sciences.
- Author
-
Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, Gaulton A, Gehant S, Laibe C, Redaschi N, Wimalaratne SM, Martin M, Le Novère N, Parkinson H, Birney E, and Jenkinson AM
- Subjects
- Academies and Institutes, Biomedical Research, Internet, Computational Biology methods, Databases, Genetic
- Abstract
Motivation: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.
- Published
- 2014
- Full Text
- View/download PDF
41. HAMAP in 2013, new developments in the protein family classification and annotation system.
- Author
-
Pedruzzi I, Rivoire C, Auchincloss AH, Coudert E, Keller G, de Castro E, Baratin D, Cuche BA, Bougueleret L, Poux S, Redaschi N, Xenarios I, and Bridge A
- Subjects
- Eukaryota genetics, Internet, Databases, Protein, Molecular Sequence Annotation, Proteins classification
- Abstract
HAMAP (High-quality Automated and Manual Annotation of Proteins-available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.
- Published
- 2013
- Full Text
- View/download PDF
42. Infrastructure for the life sciences: design and implementation of the UniProt website.
- Author
-
Jain E, Bairoch A, Duvaud S, Phan I, Redaschi N, Suzek BE, Martin MJ, McGarvey P, and Gasteiger E
- Subjects
- Information Storage and Retrieval methods, Internet, Proteins chemistry, User-Computer Interface, Databases, Protein, Sequence Analysis, Protein
- Abstract
Background: The UniProt consortium was formed in 2002 by groups from the Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) at Georgetown University, and soon afterwards the website http://www.uniprot.org was set up as a central entry point to UniProt resources. Requests to this address were redirected to one of the three organisations' websites. While these sites shared a set of static pages with general information about UniProt, their pages for searching and viewing data were different. To provide users with a consistent view and to cut the cost of maintaining three separate sites, the consortium decided to develop a common website for UniProt. Following several years of intense development and a year of public beta testing, the http://www.uniprot.org domain was switched to the newly developed site described in this paper in July 2008., Description: The UniProt consortium is the main provider of protein sequence and annotation data for much of the life sciences community. The http://www.uniprot.org website is the primary access point to this data and to documentation and basic tools for the data. These tools include full text and field-based text search, similarity search, multiple sequence alignment, batch retrieval and database identifier mapping. This paper discusses the design and implementation of the new website, which was released in July 2008, and shows how it improves data access for users with different levels of experience, as well as to machines for programmatic access.http://www.uniprot.org/ is open for both academic and commercial use. The site was built with open source tools and libraries. Feedback is very welcome and should be sent to help@uniprot.org., Conclusion: The new UniProt website makes accessing and understanding UniProt easier than ever. The two main lessons learned are that getting the basics right for such a data provider website has huge benefits, but is not trivial and easy to underestimate, and that there is no substitute for using empirical data throughout the development process to decide on what is and what is not working for your users.
- Published
- 2009
- Full Text
- View/download PDF
43. The Universal Protein Resource (UniProt): an expanding universe of protein information.
- Author
-
Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, and Suzek B
- Subjects
- Internet, Proteins chemistry, Proteins classification, Proteins physiology, Proteome chemistry, Sequence Analysis, Protein, Systems Integration, User-Computer Interface, Databases, Protein
- Abstract
The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
- Published
- 2006
- Full Text
- View/download PDF
44. The Universal Protein Resource (UniProt).
- Author
-
Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, and Yeh LS
- Subjects
- Amino Acid Sequence, Proteins physiology, Systems Integration, User-Computer Interface, Databases, Protein, Proteins chemistry
- Abstract
The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.
- Published
- 2005
- Full Text
- View/download PDF
45. UniProt: the Universal Protein knowledgebase.
- Author
-
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, and Yeh LS
- Subjects
- Animals, Humans, Internet, Protein Conformation, Proteins classification, Proteome, Proteomics, Terminology as Topic, Computational Biology, Databases, Protein, Proteins chemistry, Proteins metabolism
- Abstract
To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.
- Published
- 2004
- Full Text
- View/download PDF
46. The EMBL sequence version archive.
- Author
-
Leinonen R, Nardone F, Oyewole O, Redaschi N, and Stoehr P
- Subjects
- Base Sequence, Documentation, Europe, Internet, Sequence Alignment, Sequence Analysis, DNA standards, Archives, Database Management Systems, Databases, Nucleic Acid, Information Storage and Retrieval methods, Online Systems, Sequence Analysis, DNA methods
- Abstract
Summary: The EMBL Nucleotide Sequence Database, maintained at the European Bioinformatics institute, is Europe's primary nucleotide sequences database. Its entries are subject to changes, but only the most recent versions are preserved in the database. The EMBL Sequence Version Archive is a new publicly available database retaining also the earlier versions of these entries., Availability: http://www.ebi.ac.uk/embl/sva/
- Published
- 2003
- Full Text
- View/download PDF
47. The EMBL Nucleotide Sequence Database.
- Author
-
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Redaschi N, Stoehr P, Tuli MA, Tzouvara K, and Vaughan R
- Subjects
- Animals, Base Sequence, Confidentiality, Data Collection, Database Management Systems, Databases, Protein, Europe, Expressed Sequence Tags, Genome, Genome, Human, Humans, Information Storage and Retrieval, Internet, Patents as Topic, Sequence Alignment, Sequence Analysis, Systems Integration, Databases, Nucleic Acid
- Abstract
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
- Published
- 2002
- Full Text
- View/download PDF
48. The EMBL nucleotide sequence database.
- Author
-
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Lombard V, Lopez R, Parkinson H, Redaschi N, Sterk P, Stoehr P, and Tuli MA
- Subjects
- DNA genetics, Europe, Information Storage and Retrieval, Internet, Computational Biology, Databases, Factual
- Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
- Published
- 2001
- Full Text
- View/download PDF
49. Accessing and distributing EMBL data using CORBA (common object request broker architecture).
- Author
-
Wang L, Rodriguez-Tomé P, Redaschi N, McNeil P, Robinson A, and Lijnzaad P
- Subjects
- Sequence Analysis, DNA, Computational Biology methods, Databases, Factual, Software
- Abstract
Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data., Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism., Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems.
- Published
- 2000
- Full Text
- View/download PDF
50. Posttranscriptional regulation of EcoP1I and EcoP15I restriction activity.
- Author
-
Redaschi N and Bickle TA
- Subjects
- Anti-Bacterial Agents pharmacology, Bacteriophage P1 genetics, Base Sequence, Escherichia coli virology, Escherichia coli Proteins, Gene Expression Regulation, Bacterial, Genes, Bacterial, Genes, Viral, Molecular Sequence Data, Ribosomal Protein S9, Ribosomal Proteins genetics, Streptomycin pharmacology, Suppression, Genetic, Bacteriophage P1 enzymology, Deoxyribonucleases, Type III Site-Specific genetics, Escherichia coli genetics, Gene Expression Regulation, Enzymologic, Methyltransferases genetics, Protein Biosynthesis
- Abstract
Efficient establishment of a DNA restriction-modification (R-M) system in a non-modified cell requires a tight control of the potentially lethal activity of the restriction enzyme. The type III R-M systems EcoP1I and EcoP15I can be transferred to non-modified Escherichia coli cells by transfection, conjugation or transformation and become established without difficulty. Modification activity is expressed immediately after the R-M genes enter the cell, whereas the expression of restriction activity is delayed until complete protection of the cellular DNA is achieved by methylation. We have shown by Western blot analysis that the expression of the modification polypeptide subunit positively regulates the amount of restriction subunit present in the cell. The finding that ribosomal alterations affected the expression of restriction activity pointed to additional control at the translational level. The analysis of EcoP1I expression in E. coli strains mutated in either of the ribosomal proteins S12 (rpsL) or S4 (rpsD) suggests that the level of in vivo restriction activity can be modulated both by a decrease in the efficiency of translation and by varying ribosomal accuracy conditions. In addition, we have preliminary evidence from in vivo gene fusion studies that the res gene may code for more than one gene product.
- Published
- 1996
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.