24 results on '"Profiti G"'
Search Results
2. ELIXIR_ITA: a growing support to national and international research in life sciences
- Author
-
Via, A, Zambelli, F, Carnevali, A, Castrignanò, T, Cattani, A, Cuccuru, G, Della Vedova, G, Donvito, G, Facchiano, A, Fondi, M, Galeazzi, F, Licata, L, Marabotti, Anna, Milanesi, L, Picardi, E, Profiti, G, Tomassini, S, Tosatto, S, and Pesole, G.
- Published
- 2016
3. Tools and data services registry: a community effort to document bioinformatics resources
- Author
-
Ison, J., Rapacki, K., Menager, H., Kalas, M., Rydza, E., Chmura, P., Anthon, C., Beard, N., Berka, K., Bolser, D., Booth, T., Bretaudeau, A., Brezovsky, J., Casadio, R., Cesareni, G., Coppens, F., Cornell, M., Cuccuru, G., Davidsen, K., Vedova, G.D., Dogan, T., Doppelt-Azeroual, O., Emery, L., Gasteiger, E., Gatter, T., Goldberg, T., Grosjean, M., Gruning, B., Helmer-Citterich, M., Ienasescu, H., Ioannidis, V., Jespersen, M.C., Jimenez, R., Juty, N., Juvan, P., Koch, M., Laibe, C., Li, J.W., Licata, L., Mareuil, F., Micetic, I., Friborg, R.M., Moretti, S., Morris, C., Moller, S., Nenadic, A., Peterson, H., Profiti, G., Rice, P., Romano, P., Roncaglia, P., Saidi, R., Schafferhans, A., Schwammle, V., Smith, C., Sperotto, M.M., Stockinger, H., Varekova, R.S., Tosatto, S.C., Torre, V., Uva, P., Via, A., Yachdav, G., Zambelli, F., Vriend, G., Rost, B., Parkinson, H., Longreen, P., Brunak, S., Ison, J., Rapacki, K., Menager, H., Kalas, M., Rydza, E., Chmura, P., Anthon, C., Beard, N., Berka, K., Bolser, D., Booth, T., Bretaudeau, A., Brezovsky, J., Casadio, R., Cesareni, G., Coppens, F., Cornell, M., Cuccuru, G., Davidsen, K., Vedova, G.D., Dogan, T., Doppelt-Azeroual, O., Emery, L., Gasteiger, E., Gatter, T., Goldberg, T., Grosjean, M., Gruning, B., Helmer-Citterich, M., Ienasescu, H., Ioannidis, V., Jespersen, M.C., Jimenez, R., Juty, N., Juvan, P., Koch, M., Laibe, C., Li, J.W., Licata, L., Mareuil, F., Micetic, I., Friborg, R.M., Moretti, S., Morris, C., Moller, S., Nenadic, A., Peterson, H., Profiti, G., Rice, P., Romano, P., Roncaglia, P., Saidi, R., Schafferhans, A., Schwammle, V., Smith, C., Sperotto, M.M., Stockinger, H., Varekova, R.S., Tosatto, S.C., Torre, V., Uva, P., Via, A., Yachdav, G., Zambelli, F., Vriend, G., Rost, B., Parkinson, H., Longreen, P., and Brunak, S.
- Abstract
Contains fulltext : 171819.pdf (publisher's version ) (Open Access), Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
- Published
- 2016
4. An expanded evaluation of protein function prediction methods shows an improvement in accuracy
- Author
-
Jiang, Y, Oron, TR, Clark, WT, Bankapur, AR, D'Andrea, D, Lepore, R, Funk, CS, Kahanda, I, Verspoor, KM, Ben-Hur, A, Koo, DCE, Penfold-Brown, D, Shasha, D, Youngs, N, Bonneau, R, Lin, A, Sahraeian, SME, Martelli, PL, Profiti, G, Casadio, R, Cao, R, Zhong, Z, Cheng, J, Altenhoff, A, Skunca, N, Dessimoz, C, Dogan, T, Hakala, K, Kaewphan, S, Mehryary, F, Salakoski, T, Ginter, F, Fang, H, Smithers, B, Oates, M, Gough, J, Toronen, P, Koskinen, P, Holm, L, Chen, C-T, Hsu, W-L, Bryson, K, Cozzetto, D, Minneci, F, Jones, DT, Chapman, S, Dukka, BKC, Khan, IK, Kihara, D, Ofer, D, Rappoport, N, Stern, A, Cibrian-Uhalte, E, Denny, P, Foulger, RE, Hieta, R, Legge, D, Lovering, RC, Magrane, M, Melidoni, AN, Mutowo-Meullenet, P, Pichler, K, Shypitsyna, A, Li, B, Zakeri, P, ElShal, S, Tranchevent, L-C, Das, S, Dawson, NL, Lee, D, Lees, JG, Sillitoe, I, Bhat, P, Nepusz, T, Romero, AE, Sasidharan, R, Yang, H, Paccanaro, A, Gillis, J, Sedeno-Cortes, AE, Pavlidis, P, Feng, S, Cejuela, JM, Goldberg, T, Hamp, T, Richter, L, Salamov, A, Gabaldon, T, Marcet-Houben, M, Supek, F, Gong, Q, Ning, W, Zhou, Y, Tian, W, Falda, M, Fontana, P, Lavezzo, E, Toppo, S, Ferrari, C, Giollo, M, Piovesan, D, Tosatto, SCE, del Pozo, A, Fernandez, JM, Maietta, P, Valencia, A, Tress, ML, Benso, A, Di Carlo, S, Politano, G, Savino, A, Rehman, HU, Re, M, Mesiti, M, Valentini, G, Bargsten, JW, van Dijk, ADJ, Gemovic, B, Glisic, S, Perovic, V, Veljkovic, V, Veljkovic, N, Almeida-e-Silva, DC, Vencio, RZN, Sharan, M, Vogel, J, Kansakar, L, Zhang, S, Vucetic, S, Wang, Z, Sternberg, MJE, Wass, MN, Huntley, RP, Martin, MJ, O'Donovan, C, Robinson, PN, Moreau, Y, Tramontano, A, Babbitt, PC, Brenner, SE, Linial, M, Orengo, CA, Rost, B, Greene, CS, Mooney, SD, Friedberg, I, Radivojac, P, Jiang, Y, Oron, TR, Clark, WT, Bankapur, AR, D'Andrea, D, Lepore, R, Funk, CS, Kahanda, I, Verspoor, KM, Ben-Hur, A, Koo, DCE, Penfold-Brown, D, Shasha, D, Youngs, N, Bonneau, R, Lin, A, Sahraeian, SME, Martelli, PL, Profiti, G, Casadio, R, Cao, R, Zhong, Z, Cheng, J, Altenhoff, A, Skunca, N, Dessimoz, C, Dogan, T, Hakala, K, Kaewphan, S, Mehryary, F, Salakoski, T, Ginter, F, Fang, H, Smithers, B, Oates, M, Gough, J, Toronen, P, Koskinen, P, Holm, L, Chen, C-T, Hsu, W-L, Bryson, K, Cozzetto, D, Minneci, F, Jones, DT, Chapman, S, Dukka, BKC, Khan, IK, Kihara, D, Ofer, D, Rappoport, N, Stern, A, Cibrian-Uhalte, E, Denny, P, Foulger, RE, Hieta, R, Legge, D, Lovering, RC, Magrane, M, Melidoni, AN, Mutowo-Meullenet, P, Pichler, K, Shypitsyna, A, Li, B, Zakeri, P, ElShal, S, Tranchevent, L-C, Das, S, Dawson, NL, Lee, D, Lees, JG, Sillitoe, I, Bhat, P, Nepusz, T, Romero, AE, Sasidharan, R, Yang, H, Paccanaro, A, Gillis, J, Sedeno-Cortes, AE, Pavlidis, P, Feng, S, Cejuela, JM, Goldberg, T, Hamp, T, Richter, L, Salamov, A, Gabaldon, T, Marcet-Houben, M, Supek, F, Gong, Q, Ning, W, Zhou, Y, Tian, W, Falda, M, Fontana, P, Lavezzo, E, Toppo, S, Ferrari, C, Giollo, M, Piovesan, D, Tosatto, SCE, del Pozo, A, Fernandez, JM, Maietta, P, Valencia, A, Tress, ML, Benso, A, Di Carlo, S, Politano, G, Savino, A, Rehman, HU, Re, M, Mesiti, M, Valentini, G, Bargsten, JW, van Dijk, ADJ, Gemovic, B, Glisic, S, Perovic, V, Veljkovic, V, Veljkovic, N, Almeida-e-Silva, DC, Vencio, RZN, Sharan, M, Vogel, J, Kansakar, L, Zhang, S, Vucetic, S, Wang, Z, Sternberg, MJE, Wass, MN, Huntley, RP, Martin, MJ, O'Donovan, C, Robinson, PN, Moreau, Y, Tramontano, A, Babbitt, PC, Brenner, SE, Linial, M, Orengo, CA, Rost, B, Greene, CS, Mooney, SD, Friedberg, I, and Radivojac, P
- Abstract
BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Published
- 2016
5. Tools and data services registry: A community effort to document bioinformatics resources
- Author
-
Ison, J, Rapacki, K, Ménager, H, Kalaš, M, Rydza, E, Chmura, P, Anthon, C, Beard, N, Berka, K, Bolser, D, Booth, T, Bretaudeau, A, Brezovsky, J, Casadio, R, Cesareni, G, Coppens, F, Cornell, M, Cuccuru, G, Davidsen, K, DELLA VEDOVA, G, Dogan, T, Doppelt Azeroual, O, Emery, L, Gasteiger, E, Gatter, T, Goldberg, T, Grosjean, M, Grüning, B, Helmer Citterich, M, Ienasescu, H, Ioannidis, V, Jespersen, M, Jimenez, R, Juty, N, Juvan, P, Koch, M, Laibe, C, Li, J, Licata, L, Mareuil, F, Mičetić, I, Friborg, R, Moretti, S, Morris, C, Möller, S, Nenadic, A, Peterson, H, Profiti, G, Rice, P, Romano, P, Roncaglia, P, Saidi, R, Schafferhans, A, Schwämmle, V, Smith, C, Sperotto, M, Stockinger, H, Vařeková, R, Tosatto, S, de la Torre, V, Uva, P, Via, A, Yachdav, G, Zambelli, F, Vriend, G, Rost, B, Parkinson, H, Løngreen, P, Brunak, S, DELLA VEDOVA, GIANLUCA, Brunak, S., Ison, J, Rapacki, K, Ménager, H, Kalaš, M, Rydza, E, Chmura, P, Anthon, C, Beard, N, Berka, K, Bolser, D, Booth, T, Bretaudeau, A, Brezovsky, J, Casadio, R, Cesareni, G, Coppens, F, Cornell, M, Cuccuru, G, Davidsen, K, DELLA VEDOVA, G, Dogan, T, Doppelt Azeroual, O, Emery, L, Gasteiger, E, Gatter, T, Goldberg, T, Grosjean, M, Grüning, B, Helmer Citterich, M, Ienasescu, H, Ioannidis, V, Jespersen, M, Jimenez, R, Juty, N, Juvan, P, Koch, M, Laibe, C, Li, J, Licata, L, Mareuil, F, Mičetić, I, Friborg, R, Moretti, S, Morris, C, Möller, S, Nenadic, A, Peterson, H, Profiti, G, Rice, P, Romano, P, Roncaglia, P, Saidi, R, Schafferhans, A, Schwämmle, V, Smith, C, Sperotto, M, Stockinger, H, Vařeková, R, Tosatto, S, de la Torre, V, Uva, P, Via, A, Yachdav, G, Zambelli, F, Vriend, G, Rost, B, Parkinson, H, Løngreen, P, Brunak, S, DELLA VEDOVA, GIANLUCA, and Brunak, S.
- Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
- Published
- 2016
6. SUS-BAR: a database of pig proteins with statistically validated structural and functional annotation
- Author
-
Piovesan, D., primary, Profiti, G., additional, Martelli, P. L., additional, Fariselli, P., additional, Fontanesi, L., additional, and Casadio, R., additional
- Published
- 2013
- Full Text
- View/download PDF
7. Community detection within clusters helps large scale protein annotation: Preliminary results of modularity maximization for the bar+ database
- Author
-
Profiti, G., Piovesan, D., Pier Luigi Martelli, Fariselli, P., Casadio, R., Jordi Solé-Casals, Ana Fred,Hugo Gamboa, Pedro Fernandes, Profiti G, Piovesan D, Martelli PL, Fariselli P, and Casadio R
- Subjects
PROTEIN FUNCTIONAL ANNOTATION ,COMMUNITY DETECTION ALGORITHMS - Abstract
Given the exponentially increasing amount of available data, electronic annotation procedures for protein sequences are a core topic in bioinformatics. In this paper we present the refinement of an already published procedure that allows a fine grained level of detail in the annotation results. This enhancement is based on a graph representation of the similarity relationship between sequences within a cluster, followed by the application of community detection algorithms. These algorithms identify groups of highly connected nodes inside a bigger graph. The core idea is that sequences belonging to the same community share more features in respect to all the other sequences in the same graph.
8. Whole Genome Sequence Analysis of Brucella abortus Isolates from Various Regions of South Africa
- Author
-
Barbara Glover, Katiuscia Zilli, Francesca Marotta, Henriette van Heerden, Rita Casadio, Anna Janowicz, Pier Luigi Martelli, Maphuti Betty Ledwaba, Itumeleng Matle, Giuliano Garofolo, Giuseppe Profiti, Ledwaba M.B., Glover B.A., Matle I., Profiti G., Martelli P.L., Casadio R., Zilli K., Janowicz A., Marotta F., Garofolo G., and van Heerden H.
- Subjects
0301 basic medicine ,Microbiology (medical) ,Veterinary medicine ,comparative analysis ,animal diseases ,030106 microbiology ,Brucella abortus ,Virulence ,Single-nucleotide polymorphism ,Brucella abortu ,Brucella ,Bovine brucellosi ,Microbiology ,Genome ,Article ,03 medical and health sciences ,single nucleotide polymorphisms ,Virology ,Genotype ,medicine ,lcsh:QH301-705.5 ,Comparative analysi ,biology ,whole genome sequence ,Brucellosis ,biology.organism_classification ,medicine.disease ,Vaccination ,030104 developmental biology ,lcsh:Biology (General) ,Herd ,Single nucleotide polymor-phism ,bovine brucellosis - Abstract
The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4–8 months or might be a problem associated with vaccine production.
- Published
- 2021
- Full Text
- View/download PDF
9. The Cafa Challenge Reports Improved Protein Function Prediction And New Functional Annotations For Hundreds Of Genes Through Experimental Screens
- Author
-
Heiko Schoof, Ahmet Sureyya Rifaioglu, Ian Sillitoe, Shanfeng Zhu, Marco Carraro, Naihui Zhou, Asa Ben-Hur, Rui Fa, Alice C. McHardy, David W. Ritchie, George Georghiou, Filip Ginter, Haixuan Yang, Alex A. Freitas, Constance J. Jeffery, Tapio Salakoski, Radoslav Davidovic, Huy N Nguyen, Devon Johnson, Yotam Frank, Alexandra J. Lee, Sean D. Mooney, Marco Falda, Marie-Dominique Devignes, Gianfranco Politano, David T. Jones, Silvio C. E. Tosatto, Renzhi Cao, Zihan Zhang, Sabeur Aridhi, Stefano Pascarelli, Vedrana Vidulin, Qizhong Mao, Balint Z. Kacsoh, Patricia C. Babbitt, Giovanni Bosco, Farrokh Mehryary, Florian Boecker, Alfonso E. Romero, Angela D. Wilkins, Saso Dzeroski, Richard Bonneau, Hans Moen, Chengxin Zhang, Prajwal Bhat, Giuliano Grossi, Martti Tolvanen, Matteo Re, Meet Barot, Mohammad R. K. Mofrad, Predrag Radivojac, Stefano Di Carlo, Tatyana Goldberg, Branislava Gemovic, Suyang Dai, Pier Luigi Martelli, Giorgio Valentini, Maxat Kulmanov, Maria Jesus Martin, Claire O'Donovan, Dallas J. Larsen, Alexandre Renaux, Alan Medlar, Jeffrey M. Yunes, Erica Suh, Volkan Atalay, Vladimir Gligorijević, Fran Supek, Elaine Zosa, Wei-Cheng Tseng, Nafiz Hamid, Marco Mesiti, Tunca Doğan, Petri Törönen, Hafeez Ur Rehman, Jose Manuel Rodriguez, Alessandro Petrini, Sayoni Das, Burkhard Rost, Miguel Amezola, Mateo Torres, Jianlin Cheng, Daisuke Kihara, Liisa Holm, Marco Frasca, Steven E. Brenner, Stefano Toppo, Adrian M. Altenhoff, Chenguang Zhao, Daniel B. Roche, Alperen Dalkiran, Alex W. Crocker, Marco Notaro, Iddo Friedberg, Michal Linial, Julian Gough, Damiano Piovesan, Slobodan Vucetic, Natalie Thurlby, Olivier Lichtarge, Jari Björne, Jonas Reeb, Rabie Saidi, Yuxiang Jiang, Christophe Dessimoz, Jie Hou, Ronghui You, Tomislav Šmuc, Paolo Fontana, Michele Berselli, Jia-Ming Chang, Deborah A. Hogan, Larry Davis, Ehsaneddin Asgari, Shuwei Yao, Zheng Wang, Fabio Fabris, Michael L. Tress, Caleb Chandler, Christine A. Orengo, Rengul Cetin Atalay, Castrense Savojardo, Danielle A Brackenridge, Peter W. Rose, Yang Zhang, Dane Jo, Gage S. Black, Shanshan Zhang, Aashish Jain, Liam J. McGuffin, Timothy Bergquist, Peter L. Freddolino, Robert Hoehndorf, Rita Casadio, Da Chen Emily Koo, Mark N. Wass, Hai Fang, Casey S. Greene, Suwisa Kaewphan, Magdalena Antczak, Wen-Hung Liao, Enrico Lavezzo, Neven Sumonja, Ashton Omdahl, José M. Fernández, Ilya Novikov, Jonathan B. Dayton, Feng Zhang, Vladimir Perovic, Cen Wan, Jonathan G. Lees, Kai Hakala, Weidong Tian, Alex Warwick Vesztrocy, Domenico Cozzetto, Nevena Veljkovic, Yi-Wei Liu, Imane Boudellioua, Po-Han Chi, Kimberley A. Lewis, Seyed Ziaeddin Alborzi, Giuseppe Profiti, Alberto Paccanaro, Itamar Borukhov, Alfredo Benso, Indika Kahanda, Rebecca L. Hurto, Bilgisayar Mühendisliği, National Science Foundation (United States), Gordon and Betty Moore Foundation, United States of Department of Health & Human Services, Cystic Fibrosis Foundation, Consejo Nacional de Ciencia y Tecnología (México), Deutsche Forschungsgemeinschaft (Alemania), European Research Council, Ministerio de Ciencia e Innovación (España), Unión Europea, University of Turku (Finlandia), Finlands Akademi (Finlandia), National Natural Science Foundation of China, Nanjing Agricultural University. The Academy of Science. National Key Research & Development Program of China, Ministero dell Istruzione, dell Universita e della Ricerca (Italia), Shanghai Municipal Science and Technology Major Project, Biotechnology and Biological Sciences Research Council (Reino Unido), Extreme Science and Engineering Discovery Environment, Ministry of Education, Science and Technological Development (Serbia), Ministry of Science and Technology, Ministry for Education (Baviera) (Alemania), Yad Hanadiv, University of Milan (Italia), Swiss National Science Foundation, Unión Europea. European Cooperation in Science and Technology (COST), Plataforma ISCIII de Bioinformática (España), Scientific and Technological Research Council of Turkey, Ministry of Education (China), University of Padua (Italia), Mühendislik ve Doğa Bilimleri Fakültesi -- Bilgisayar Mühendisliği Bölümü, Rifaioğlu, Ahmet Süreyya, Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N., Davis L., Dogan T., Atalay V., Rifaioglu A.S., Dalklran A., Cetin Atalay R., Zhang C., Hurto R.L., Freddolino P.L., Zhang Y., Bhat P., Supek F., Fernandez J.M., Gemovic B., Perovic V.R., Davidovic R.S., Sumonja N., Veljkovic N., Asgari E., Mofrad M.R.K., Profiti G., Savojardo C., Martelli P.L., Casadio R., Boecker F., Schoof H., Kahanda I., Thurlby N., McHardy A.C., Renaux A., Saidi R., Gough J., Freitas A.A., Antczak M., Fabris F., Wass M.N., Hou J., Cheng J., Wang Z., Romero A.E., Paccanaro A., Yang H., Goldberg T., Zhao C., Holm L., Toronen P., Medlar A.J., Zosa E., Borukhov I., Novikov I., Wilkins A., Lichtarge O., Chi P.-H., Tseng W.-C., Linial M., Rose P.W., Dessimoz C., Vidulin V., Dzeroski S., Sillitoe I., Das S., Lees J.G., Jones D.T., Wan C., Cozzetto D., Fa R., Torres M., Warwick Vesztrocy A., Rodriguez J.M., Tress M.L., Frasca M., Notaro M., Grossi G., Petrini A., Re M., Valentini G., Mesiti M., Roche D.B., Reeb J., Ritchie D.W., Aridhi S., Alborzi S.Z., Devignes M.-D., Koo D.C.E., Bonneau R., Gligorijevic V., Barot M., Fang H., Toppo S., Lavezzo E., Falda M., Berselli M., Tosatto S.C.E., Carraro M., Piovesan D., Ur Rehman H., Mao Q., Zhang S., Vucetic S., Black G.S., Jo D., Suh E., Dayton J.B., Larsen D.J., Omdahl A.R., McGuffin L.J., Brackenridge D.A., Babbitt P.C., Yunes J.M., Fontana P., Zhang F., Zhu S., You R., Zhang Z., Dai S., Yao S., Tian W., Cao R., Chandler C., Amezola M., Johnson D., Chang J.-M., Liao W.-H., Liu Y.-W., Pascarelli S., Frank Y., Hoehndorf R., Kulmanov M., Boudellioua I., Politano G., Di Carlo S., Benso A., Hakala K., Ginter F., Mehryary F., Kaewphan S., Bjorne J., Moen H., Tolvanen M.E.E., Salakoski T., Kihara D., Jain A., Smuc T., Altenhoff A., Ben-Hur A., Rost B., Brenner S.E., Orengo C.A., Jeffery C.J., Bosco G., Hogan D.A., Martin M.J., O'Donovan C., Mooney S.D., Greene C.S., Radivojac P., Friedberg I., Faculty of Economic and Social Sciences and Solvay Business School, Faculty of Sciences and Bioengineering Sciences, Faculty of Engineering, Computational genomics, Institute of Biotechnology, Bioinformatics, Genetics, Helsinki Institute of Life Science HiLIFE, Discovery Research Group/Prof. Hannu Toivonen, Iowa State University (ISU), European Bioinformatics Institute, École Polytechnique de Montréal (EPM), Vinča Institute of Nuclear Sciences, University of Belgrade [Belgrade], University of Bologna, Max Planck Institute for Plant Breeding Research (MPIPZ), European Virus Bioinformatics Center [Jena], Université libre de Bruxelles (ULB), Laboratoire d'Informatique, de Modélisation et d'optimisation des Systèmes (LIMOS), SIGMA Clermont (SIGMA Clermont)-Université d'Auvergne - Clermont-Ferrand I (UdA)-Ecole Nationale Supérieure des Mines de St Etienne-Centre National de la Recherche Scientifique (CNRS)-Université Blaise Pascal - Clermont-Ferrand 2 (UBP), Department of Computer Science, University of Bristol [Bristol], Department of Computer Science [Columbia], University of Missouri [Columbia] (Mizzou), University of Missouri System-University of Missouri System, Yale School of Public Health (YSPH), Departamento de Geometría y Topología, Universidad de Granada (UGR), Tumor Biology Center, Centre for Nephrology [London, UK], University College of London [London] (UCL), Baylor College of Medicine (BCM), Baylor University, Department of Knowledge Technologies, Structural and Molecular Biology Department, University College London, Queen Mary University of London (QMUL), Spanish National Cancer Research Center (CNIO), Dipartimento di Informatica, Università degli Studi di Milano [Milano] (UNIMI), Dipartimento di Scienze dell'Informazione [Milano], United States Naval Academy, Computational Algorithms for Protein Structures and Interactions (CAPSID), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Department of Molecular Medicine, Universita degli Studi di Padova, Centro de Regulación Genómica (CRG), Universitat Pompeu Fabra [Barcelona] (UPF), Physics Department, National Tsing Hua University [Hsinchu] (NTHU), Dipartimento di Automatica e Informatica [Torino] (DAUIN), Politecnico di Torino = Polytechnic of Turin (Polito), University of Turku, Bioinformatics Laboratory, University of Turku-Turku Center for Computer Science, Toyota Technological Institute at Chicago [Chicago] (TTIC), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne (UNIL), Department of Computer Science [Colorado State University], Colorado State University [Fort Collins] (CSU), Centre for Plant Integrative Biology [Nothingham] (CPIB), University of Nottingham, UK (UON), BRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany., University of Bologna/Università di Bologna, Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-Université d'Auvergne - Clermont-Ferrand I (UdA)-SIGMA Clermont (SIGMA Clermont)-Ecole Nationale Supérieure des Mines de St Etienne (ENSM ST-ETIENNE)-Centre National de la Recherche Scientifique (CNRS), Universidad de Granada = University of Granada (UGR), Università degli Studi di Milano = University of Milan (UNIMI), Università degli Studi di Padova = University of Padua (Unipd), and Université de Lausanne = University of Lausanne (UNIL)
- Subjects
Library ,Male ,Identification ,Candida-albicans ,Protein function prediction ,Long-term memory ,Biofilm ,Critical assessment ,Community challenge ,Procedures ,Genome ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,0302 clinical medicine ,Candida albicans ,Molecular genetics ,lcsh:QH301-705.5 ,ComputingMilieux_MISCELLANEOUS ,Biological ontology ,Settore BIO/11 - BIOLOGIA MOLECOLARE ,0303 health sciences ,318 Medical biotechnology ,Biotechnology & applied microbiology ,Ontology ,Expectation ,Genetics & heredity ,Plant leaf ,ddc ,3. Good health ,Drosophila melanogaster ,Human experiment ,Fungal genome ,Pseudomonas aeruginosa ,Female ,[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC] ,Genome, Fungal ,BIOINFORMATICS ,Long-Term memory ,Locomotion ,Human ,Adult ,Memory, Long-Term ,lcsh:QH426-470 ,Bioinformatics ,Long term memory ,Generation ,Bacterial genome ,Computational biology ,Biology ,Article ,03 medical and health sciences ,Annotation ,Big data ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Pseudomonas ,Genetics ,Animals ,Humans ,Gene ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,Animal ,Research ,Experimental data ,Molecular Sequence Annotation ,Cell Biology ,Nonhuman ,Human genetics ,lcsh:Genetics ,lcsh:Biology (General) ,Biofilms ,Proteins | Genes | Protein functions ,[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] ,030217 neurology & neurosurgery ,Function (biology) ,Genome, Bacterial - Abstract
Tosatto, Silvio/0000-0003-4525-7793; Zhang, Feng/0000-0003-3447-897X; Gonzalez, Jose Maria Fernandez/0000-0002-4806-5140; Devignes, Marie-Dominique/0000-0002-0399-8713; Wass, Mark/0000-0001-5428-6479; Falda, Marco/0000-0003-2642-519X; Thurlby, Natalie/0000-0002-1007-0286; Zosa, Elaine/0000-0003-2482-0663; Dessimoz, Christophe/0000-0002-2170-853X; Yunes, Jeffrey/0000-0003-1869-3231; Hamid, Md Nafiz/0000-0001-8681-6526; Hoehndorf, Robert/0000-0001-8149-5890; Dogan, Tunca/0000-0002-1298-9763; NOTARO, MARCO/0000-0003-4309-2200; Cozzetto, Domenico/0000-0001-6752-5432; Lewis, Kimberley/0000-0003-3010-8453; Roche, Daniel/0000-0002-9204-1840; Martin, Maria-Jesus/0000-0001-5454-2815; Tress, Michael/0000-0001-9046-6370; Tolvanen, Martti/0000-0003-3434-7646; Cheng, Jianlin/0000-0003-0305-2853; Rose, Peter/0000-0001-9981-9750; Renaux, Alexandre/0000-0002-4339-2791; Kacsoh, Balint/0000-0001-9171-0611; O'Donovan, Claire/0000-0001-8051-7429; Kulmanov, Maxat/0000-0003-1710-1820; Friedberg, Iddo/0000-0002-1789-8000; Zhou, Naihui/0000-0001-6268-6149, WOS: 000498615000001, PubMed ID: 31744546, Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens., National Science FoundationNational Science Foundation (NSF) [DBI1564756, DBI-1458359, DBI-1458390, DMS1614777, CMMI1825941, NSF 1458390]; Gordon and Betty Moore FoundationGordon and Betty Moore Foundation [GBMF 4552]; National Institutes of Health NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [P20 GM113132]; Cystic Fibrosis Foundation [CFRDP STANTO19R0]; BBSRCBiotechnology and Biological Sciences Research Council (BBSRC) [BB/K004131/1, BB/F00964X/1, BB/M025047/1, BB/M015009/1]; Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT)Consejo Nacional de Ciencia y Tecnologia (CONACyT) [14-INV-088, PINV15-315]; NSFNational Science Foundation (NSF) [1660648, DBI 1759934, IIS1763246, DBI-1458477, 0965768, DMR-1420073, DBI-1458443]; NIHUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM093123, DP1MH110234, UL1 TR002319, U24 TR002306]; Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy-EXC 2155 "RESIST"German Research Foundation (DFG) [39087428]; National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [R01GM123055, R01GM60595, R15GM120650, GM083107, GM116960, AI134678, NIH R35-GM128637, R00-GM097033]; ERCEuropean Research Council (ERC) [StG 757700]; Spanish Ministry of Science, Innovation and Universities [BFU2017-89833-P]; Severo Ochoa award; Centre of Excellence project "BioProspecting of Adriatic Sea"; Croatian Government; European Regional Development FundEuropean Union (EU) [KK.01.1.1.01.0002]; ATT Tieto kayttoon grant; Academy of FinlandAcademy of Finland; University of Turku; CSC-IT Center for Science Ltd.; University of Miami; National Cancer Institute of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Cancer Institute (NCI) [U01CA198942]; Helsinki Institute for Life Sciences; Academy of FinlandAcademy of Finland [292589]; National Natural Science Foundation of ChinaNational Natural Science Foundation of China [31671367, 31471245, 91631301, 61872094, 61572139]; National Key Research and Development Program of China [2016YFC1000505, 2017YFC0908402]; Italian Ministry of Education, University and Research (MIUR) PRIN 2017 projectMinistry of Education, Universities and Research (MIUR) [2017483NH8]; Shanghai Municipal Science and Technology Major Project [2017SHZDZX01, 2018SHZDZX01]; UK Biotechnology and Biological Sciences Research CouncilBiotechnology and Biological Sciences Research Council (BBSRC) [BB/N019431/1, BB/L020505/1, BB/L002817/1]; Elsevier; Extreme Science and Engineering Discovery Environment (XSEDE) award [MCB160101, MCB160124]; Ministry of Education, Science and Technological Development of the Republic of Serbia [173001]; Taiwan Ministry of Science and Technology [106-2221-E-004-011-MY2]; Montana State University; Bavarian Ministry for Education; Simons Foundation; NIH NINDSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of Neurological Disorders & Stroke (NINDS) [1R21NS103831-01]; University of Illinois at Chicago (UIC) Cancer Center award; UIC College of Liberal Arts and Sciences Faculty Award; UIC International Development Award; Yad Hanadiv [9660/2019]; National Institute of General Medical Science of the National Institute of Health [GM066099, GM079656]; Research Supporting Plan (PSR) of University of Milan [PSR2018-DIP-010-MFRAS]; Swiss National Science FoundationSwiss National Science Foundation (SNSF) [150654]; EMBL-European Bioinformatics Institute core funds; CAFA BBSRC [BB/N004876/1]; European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grantEuropean Union (EU) [778247]; COST ActionEuropean Cooperation in Science and Technology (COST) [BM1405]; NIH/NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R01 GM071749]; National Human Genome Research Institute of the National of Health [U41 HG007234]; INB Grant (ISCIII-SGEFI/ERDF) [PT17/0009/0001]; TUBITAKTurkiye Bilimsel ve Teknolojik Arastirma Kurumu (TUBITAK) [EEEAG-116E930]; KanSil [2016K121540]; Universita degli Studi di Milano; 111 ProjectMinistry of Education, China - 111 Project [B18015]; key project of Shanghai Science Technology [16JC1420402]; ZJLab; project Ribes Network POR-FESR 3S4H [TOPP-ALFREVE18-01]; PRID/SID of University of Padova [TOPP-SID19-01]; NIGMSUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R15GM120650]; King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/3454-01-01, URF/1/3790-01-01]; "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education; National Center for High-performance ComputingIstanbul Technical University, The work of IF was funded, in part, by the National Science Foundation award DBI-1458359. The work of CSG and AJL was funded, in part, by the National Science Foundation award DBI-1458390 and GBMF 4552 from the Gordon and Betty Moore Foundation. The work of DAH and KAL was funded, in part, by the National Science Foundation award DBI-1458390, National Institutes of Health NIGMS P20 GM113132, and the Cystic Fibrosis Foundation CFRDP STANTO19R0. The work of AP, HY, AR, and MT was funded by BBSRC grants BB/K004131/1, BB/F00964X/1 and BB/M025047/1, Consejo Nacional de Ciencia y Tecnologia Paraguay (CONACyT) grants 14-INV-088 and PINV15-315, and NSF Advances in BioInformatics grant 1660648. The work of JC was partially supported by an NIH grant (R01GM093123) and two NSF grants (DBI 1759934 and IIS1763246). ACM acknowledges the support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -EXC 2155 "RESIST" - Project ID 39087428. DK acknowledges the support from the National Institutes of Health (R01GM123055) and the National Science Foundation (DMS1614777, CMMI1825941). PB acknowledges the support from the National Institutes of Health (R01GM60595). GB and BZK acknowledge the support from the National Science Foundation (NSF 1458390) and NIH DP1MH110234. FS was funded by the ERC StG 757700 "HYPER-INSIGHT" and by the Spanish Ministry of Science, Innovation and Universities grant BFU2017-89833-P. FS further acknowledges the funding from the Severo Ochoa award to the IRB Barcelona. TS was funded by the Centre of Excellence project "BioProspecting of Adriatic Sea", co-financed by the Croatian Government and the European Regional Development Fund (KK.01.1.1.01.0002). The work of SK was funded by ATT Tieto kayttoon grant and Academy of Finland. JB and HM acknowledge the support of the University of Turku, the Academy of Finland and CSC -IT Center for Science Ltd. TB and SM were funded by the NIH awards UL1 TR002319 and U24 TR002306. The work of CZ and ZW was funded by the National Institutes of Health R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of PWR was supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA198942. PR acknowledges NSF grant DBI-1458477. PT acknowledges the support from Helsinki Institute for Life Sciences. The work of AJM was funded by the Academy of Finland (No. 292589). The work of FZ and WT was funded by the National Natural Science Foundation of China (31671367, 31471245, 91631301) and the National Key Research and Development Program of China (2016YFC1000505, 2017YFC0908402]. CS acknowledges the support by the Italian Ministry of Education, University and Research (MIUR) PRIN 2017 project 2017483NH8. SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). PLF and RLH were supported by the National Institutes of Health NIH R35-GM128637 and R00-GM097033. JG, DTJ, CW, DC, and RF were supported by the UK Biotechnology and Biological Sciences Research Council (BB/N019431/1, BB/L020505/1, and BB/L002817/1) and Elsevier. The work of YZ and CZ was funded in part by the National Institutes of Health award GM083107, GM116960, and AI134678; the National Science Foundation award DBI1564756; and the Extreme Science and Engineering Discovery Environment (XSEDE) award MCB160101 and MCB160124.; The work of BG, VP, RD, NS, and NV was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia, Project No. 173001. The work of YWL, WHL, and JMC was funded by the Taiwan Ministry of Science and Technology (106-2221-E-004-011-MY2). YWL, WHL, and JMC further acknowledge the support from "the Human Project from Mind, Brain and Learning" of the NCCU Higher Education Sprout Project by the Taiwan Ministry of Education and the National Center for High-performance Computing for computer time and facilities. The work of IK and AB was funded by Montana State University and NSF Advances in Biological Informatics program through grant number 0965768. BR, TG, and JR are supported by the Bavarian Ministry for Education through funding to the TUM. The work of RB, VG, MB, and DCEK was supported by the Simons Foundation, NIH NINDS grant number 1R21NS103831-01 and NSF award number DMR-1420073. CJJ acknowledges the funding from a University of Illinois at Chicago (UIC) Cancer Center award, a UIC College of Liberal Arts and Sciences Faculty Award, and a UIC International Development Award. The work of ML was funded by Yad Hanadiv (grant number 9660/2019). The work of OL and IN was funded by the National Institute of General Medical Science of the National Institute of Health through GM066099 and GM079656. Research Supporting Plan (PSR) of University of Milan number PSR2018-DIP-010-MFRAS. AWV acknowledges the funding from the BBSRC (CASE studentship BB/M015009/1). CD acknowledges the support from the Swiss National Science Foundation (150654). CO and MJM are supported by the EMBL-European Bioinformatics Institute core funds and the CAFA BBSRC BB/N004876/1. GG is supported by CAFA BBSRC BB/N004876/1. SCET acknowledges funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No 778247 (IDPfun) and from COST Action BM1405 (NGP-net). SEB was supported by NIH/NIGMS grant R01 GM071749. The work of MLT, JMR, and JMF was supported by the National Human Genome Research Institute of the National of Health, grant numbers U41 HG007234. The work of JMF and JMR was also supported by INB Grant (PT17/0009/0001 - ISCIII-SGEFI/ERDF). VA acknowledges the funding from TUBITAK EEEAG-116E930. RCA acknowledges the funding from KanSil 2016K121540. GV acknowledges the funding from Universita degli Studi di Milano - Project "Discovering Patterns in Multi-Dimensional Data" and Project "Machine Learning and Big Data Analysis for Bioinformatics". SZ is supported by the National Natural Science Foundation of China (No. 61872094 and No. 61572139) and Shanghai Municipal Science and Technology Major Project (No. 2017SHZDZX01). RY and SY are supported by the 111 Project (NO. B18015), the key project of Shanghai Science & Technology (No. 16JC1420402), Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01), and ZJLab. ST was supported by project Ribes Network POR-FESR 3S4H (No. TOPP-ALFREVE18-01) and PRID/SID of University of Padova (No. TOPP-SID19-01). CZ and ZW were supported by the NIGMS grant R15GM120650 to ZW and start-up funding from the University of Miami to ZW. The work of MK and RH was supported by the funding from King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/3454-01-01 and URF/1/3790-01-01. The work of SDM is funded, in part, by NSF award DBI-1458443.
- Published
- 2019
- Full Text
- View/download PDF
10. How to inherit statistically validated annotation within BAR+ protein clusters
- Author
-
Ivan Rossi, Rita Casadio, Pier Luigi Martelli, Damiano Piovesan, Piero Fariselli, Giuseppe Profiti, Andrea Zauli, Piovesan D, Martelli PL, Fariselli P, Profiti G, Zauli A, Rossi I, and Casadio R
- Subjects
Cluster Analysis ,Data Interpretation ,Statistical ,Databases ,Protein ,Genomics ,Proteins ,Vocabulary ,Controlled ,Molecular Sequence Annotation ,Sequence Analysis ,Biochemistry ,Molecular Biology ,Computer Science Applications1707 Computer Vision and Pattern Recognition ,Applied Mathematics ,Structural Biology ,PROTEIN FUNCTIONAL ANNOTATION ,Data Interpretation, Statistical ,Databases, Protein ,Vocabulary, Controlled ,Sequence Analysis, Protein ,Computational biology ,Biology ,03 medical and health sciences ,Annotation ,Protein Annotation ,Controlled vocabulary ,Critical Assessment of Function Annotation ,030304 developmental biology ,0303 health sciences ,Information retrieval ,030302 biochemistry & molecular biology ,Computer Science Applications ,Proceedings ,remote homologs ,Reference database ,UniProt ,DNA microarray - Abstract
Background In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowledge. The reference database for protein sequences is UniProtKB divided into two sections, UniProtKB/TrEMBL which is automatically annotated and not reviewed and UniProtKB/Swiss-Prot which is manually annotated and reviewed. The annotation process is essentially based on sequence similarity search. The question therefore arises as to which extent annotation based on transfer by inheritance is valuable and specifically if it is possible to statistically validate inherited features when little homology exists among the target sequence and its template(s). Results In this paper we address the problem of annotating protein sequences in a statistically validated manner considering as a reference annotation resource UniProtKB. The test case is the set of 48,298 proteins recently released by the Critical Assessment of Function Annotations (CAFA) organization. We show that we can transfer after validation, Gene Ontology (GO) terms of the three main categories and Pfam domains to about 68% and 72% of the sequences, respectively. This is possible after alignment of the CAFA sequences towards BAR+, our annotation resource that allows discriminating among statistically validated and not statistically validated annotation. By comparing with a direct UniProtKB annotation, we find that besides validating annotation of some 78% of the CAFA set, we assign new and statistically validated annotation to 14.8% of the sequences and find new structural templates for about 25% of the chains, half of which share less than 30% sequence identity to the corresponding template/s. Conclusion Inheritance of annotation by transfer generally requires a careful selection of the identity value among the target and the template in order to transfer structural and/or functional features. Here we prove that even distantly remote homologs can be safely endowed with structural templates and GO and/or Pfam terms provided that annotation is done within clusters collecting cluster-related protein sequences and where a statistical validation of the shared structural and functional features is possible.
- Published
- 2013
11. Tools and data services registry: a community effort to document bioinformatics resources
- Author
-
Callum Smith, Paolo Uva, Thomas Gatter, Peter Løngreen, Peter Juvan, Hans Ienasescu, Giuseppe Profiti, Aleksandra Nenadic, Kristoffer Rapacki, Chris Morris, Paola Roncaglia, Steffen Möller, Laura Emery, Søren Brunak, Maria Maddalena Sperotto, Heinz Stockinger, Kristian Davidsen, Federico Zambelli, Helen Parkinson, Olivia Doppelt-Azeroual, Luana Licata, Tatyana Goldberg, Andrea Schafferhans, Elisabeth Gasteiger, Emil Karol Rydza, Camille Laibe, Victor De La Torre, Marie Grosjean, Manuela Helmer-Citterich, Hervé Ménager, Radka Svobodová Vařeková, Rafael C. Jimenez, Martin Closter Jespersen, Anthony Bretaudeau, Jan Brezovsky, Tunca Doğan, Matúš Kalaš, Peter M. Rice, Ivan Mičetić, Rune Møllegaard Friborg, Maximilian Koch, Silvio C. E. Tosatto, Nick Juty, Björn Grüning, Gianmauro Cuccuru, Frederik Coppens, Gianni Cesareni, Jon Ison, Rabie Saidi, Sébastien Moretti, Rita Casadio, Gert Vriend, Guy Yachdav, Niall Beard, Timothy F. Booth, Michael Cornell, Piotr Jaroslaw Chmura, Veit Schwämmle, Karel Berka, Dan Bolser, Vassilios Ioannidis, Jing-Woei Li, Burkhard Rost, Gianluca Della Vedova, Fabien Mareuil, Hedi Peterson, Allegra Via, Paolo Romano, Christian Anthon, Technical University of Denmark [Lyngby] (DTU), Institut Pasteur de Madagascar, Réseau International des Instituts Pasteur (RIIP), University of Bergen (UIB), University of Copenhagen = Københavns Universitet (KU), University of Manchester, Palacky University, European Bioinformatics Institute, NEBC Wallingford, Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Institut National de la Recherche Agronomique (INRA)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-AGROCAMPUS OUEST, Masaryk University, University of Bologna, Università degli Studi di Roma Tor Vergata [Roma], Ghent University [Belgium] (UGENT), Flanders Institute for Biotechnology, CRS4 Bioinformat, Università degli studi di Milano-Bicocca, Swiss Institute of Bioinformatics, Universität Bielefeld = Bielefeld University, Tumor Biology Center, Centre National de la Recherche Scientifique (CNRS), University of Freiburg, University of Ljubljana, The Chinese University of Hong Kong [Hong Kong], Universita degli Studi di Padova, Bioinformatics Research Centre, Université de Lausanne, CCLRC Daresbury Laboratory, Universität zu Lübeck [Lübeck] - University of Lübeck [Lübeck], Universität Rostock, University of Tartu, Imperial College London, IRCCS Azienda Ospedaliera Universitaria Integrata San Martino (IRCCS AOU San Martino), University of Southern Denmark (SDU), WTCHG, Central European Institute of Technology [Brno] (CEITEC), Instituto Nacional de Bioinformática, Sapienza University of Rome (DIAG), Consiglio Nazionale delle Ricerche, University of Milan, Radboud University Nijmegen, Ison, J, Rapacki, K, Ménager, H, Kalaš, M, Rydza, E, Chmura, P, Anthon, C, Beard, N, Berka, K, Bolser, D, Booth, T, Bretaudeau, A, Brezovsky, J, Casadio, R, Cesareni, G, Coppens, F, Cornell, M, Cuccuru, G, Davidsen, K, DELLA VEDOVA, G, Dogan, T, Doppelt Azeroual, O, Emery, L, Gasteiger, E, Gatter, T, Goldberg, T, Grosjean, M, Grüning, B, Helmer Citterich, M, Ienasescu, H, Ioannidis, V, Jespersen, M, Jimenez, R, Juty, N, Juvan, P, Koch, M, Laibe, C, Li, J, Licata, L, Mareuil, F, Mičetić, I, Friborg, R, Moretti, S, Morris, C, Möller, S, Nenadic, A, Peterson, H, Profiti, G, Rice, P, Romano, P, Roncaglia, P, Saidi, R, Schafferhans, A, Schwämmle, V, Smith, C, Sperotto, M, Stockinger, H, Vařeková, R, Tosatto, S, de la Torre, V, Uva, P, Via, A, Yachdav, G, Zambelli, F, Vriend, G, Rost, B, Parkinson, H, Løngreen, P, Brunak, S, University of Bergen (UiB), Palacky University Olomouc, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Masaryk University [Brno] (MUNI), Universiteit Gent = Ghent University [Belgium] (UGENT), Università degli Studi di Milano-Bicocca [Milano] (UNIMIB), Swiss Institute of Bioinformatics [Lausanne] (SIB), Université de Lausanne (UNIL), Universität zu Lübeck [Lübeck], Central European Institute of Technology [Brno] (CEITEC MU), Brno University of Technology [Brno] (BUT), Università degli Studi di Roma 'La Sapienza' = Sapienza University [Rome], Danmarks Tekniske Universitet = Technical University of Denmark (DTU), University of Copenhagen = Københavns Universitet (UCPH), Institut National de la Recherche Agronomique (INRA)-Université de Rennes (UR)-AGROCAMPUS OUEST, University of Bologna/Università di Bologna, Universiteit Gent = Ghent University (UGENT), Università degli Studi di Milano-Bicocca = University of Milano-Bicocca (UNIMIB), Université de Lausanne = University of Lausanne (UNIL), Università degli Studi di Padova = University of Padua (Unipd), Universität zu Lübeck = University of Lübeck [Lübeck], Università degli Studi di Roma 'La Sapienza' = Sapienza University [Rome] (UNIROMA), Università degli Studi di Milano = University of Milan (UNIMI), Ison, Jon, Rapacki, Kristoffer, Ménager, Hervé, Kalaš, Matúš, Rydza, Emil, Chmura, Piotr, Anthon, Christian, Beard, Niall, Berka, Karel, Bolser, Dan, Booth, Tim, Bretaudeau, Anthony, Brezovsky, Jan, Casadio, Rita, Cesareni, Gianni, Coppens, Frederik, Cornell, Michael, Cuccuru, Gianmauro, Davidsen, Kristian, Vedova, Gianluca Della, Dogan, Tunca, Doppelt-Azeroual, Olivia, Emery, Laura, Gasteiger, Elisabeth, Gatter, Thoma, Goldberg, Tatyana, Grosjean, Marie, Grüning, Björn, Helmer-Citterich, Manuela, Ienasescu, Han, Ioannidis, Vassilio, Jespersen, Martin Closter, Jimenez, Rafael, Juty, Nick, Juvan, Peter, Koch, Maximilian, Laibe, Camille, Li, Jing-Woei, Licata, Luana, Mareuil, Fabien, Mičetić, Ivan, Friborg, Rune Møllegaard, Moretti, Sebastien, Morris, Chri, Möller, Steffen, Nenadic, Aleksandra, Peterson, Hedi, Profiti, Giuseppe, Rice, Peter, Romano, Paolo, Roncaglia, Paola, Saidi, Rabie, Schafferhans, Andrea, Schwämmle, Veit, Smith, Callum, Sperotto, Maria Maddalena, Stockinger, Heinz, Vařeková, Radka Svobodová, Tosatto, Silvio C E, de la Torre, Victor, Uva, Paolo, Via, Allegra, Yachdav, Guy, Zambelli, Federico, Vriend, Gert, Rost, Burkhard, Parkinson, Helen, Løngreen, Peter, and Brunak, Søren
- Subjects
0301 basic medicine ,[SDV]Life Sciences [q-bio] ,registry ,Bioinformatics ,computer.software_genre ,Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Systemutvikling og -arbeid: 426 [VDP] ,Task (project management) ,Documentation ,Data and Information ,Database Issue ,Registries ,bioinformatique ,Data Curation ,base de données ,Settore BIO/11 ,gestion de données ,tool ,SOFTWARE-DEVELOPMENT ,bioinformatics ,ddc ,outil informatique ,Tools and data services registry ,SEQANSWERS ,Web service ,MOLECULAR-BIOLOGY ,Biology ,Ecology and Environment ,03 medical and health sciences ,SDG 3 - Good Health and Well-being ,Genetics ,Implementation ,Dissemination ,Bioinformatikk / Bioinformatics ,Data curation ,bioinformatic ,business.industry ,Computational Biology ,Software ,Software development ,bioinformatics, tools, registry, elixir ,Biology and Life Sciences ,Mathematics and natural scienses: 400::Information and communication science: 420::System development and design: 426 [VDP] ,FRAMEWORK ,ELIXIR ,Settore BIO/18 - Genetica ,030104 developmental biology ,tools ,Data as a service ,COMPILATION ,business ,COLLECTION ,Nanomedicine Radboud Institute for Molecular Life Sciences [Radboudumc 19] ,computer ,WEB SERVICES ,LIFE SCIENCES - Abstract
Contains fulltext : 171819.pdf (Publisher’s version ) (Open Access) Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
- Full Text
- View/download PDF
12. Whole Genome Sequence Analysis of Brucella spp. from Human, Livestock, and Wildlife in South Africa.
- Author
-
Mazwi KD, Lekota KE, Glover BA, Kolo FB, Hassim A, Rossouw J, Jonker A, Wojno JM, Profiti G, Martelli PL, Casadio R, Zilli K, Janowicz A, Marotta F, Garofolo G, and van Heerden H
- Subjects
- Animals, Humans, South Africa epidemiology, Cattle, Brucella melitensis genetics, Brucella melitensis isolation & purification, Brucella melitensis classification, Polymorphism, Single Nucleotide, Brucella genetics, Brucella classification, Brucella isolation & purification, Goats microbiology, Brucellosis microbiology, Brucellosis veterinary, Brucellosis epidemiology, Livestock microbiology, Genome, Bacterial, Whole Genome Sequencing, Animals, Wild microbiology, Brucella abortus genetics, Brucella abortus isolation & purification, Brucella abortus classification, Phylogeny
- Abstract
Brucellosis is an economically important zoonotic disease affecting humans, livestock, and wildlife health globally and especially in Africa. Brucella abortus and B. melitensis have been isolated from human, livestock (cattle and goat), and wildlife (sable) in South Africa (SA) but with little knowledge of the population genomic structure of this pathogen in SA. As whole genome sequencing can assist to differentiate and trace the origin of outbreaks of Brucella spp. strains, the whole genomes of retrospective isolates (n = 19) from previous studies were sequenced. Sequences were analysed using average nucleotide identity (ANI), pangenomics, and whole genome single nucleotide polymorphism (wgSNP) to trace the geographical origin of cases of brucellosis circulating in human, cattle, goats, and sable from different provinces in SA. Pangenomics analysis of B. melitensis (n = 69) and B. abortus (n = 56) was conducted with 19 strains that included B. abortus from cattle (n = 3) and B. melitensis from a human (n = 1), cattle (n = 1), goat (n = 1), Rev1 vaccine strain (n = 1), and sable (n = 12). Pangenomics analysis of B. melitensis genomes, highlighted shared genes, that include 10 hypothetical proteins and genes that encodes for acetyl-coenzyme A synthetase (acs), and acylamidase (aam) amongst the sable genomes. The wgSNP analysis confirmed the B. melitensis isolated from human was more closely related to the goat from the Western Cape Province from the same outbreak than the B. melitensis cattle sample from different cases in the Gauteng Province. The B. melitensis sable strains could be distinguished from the African lineage, constituting their own African sub-clade. The sequenced B. abortus strains clustered in the C2 lineage that is closely related to the isolates from Mozambique and Zimbabwe. This study identified genetically diverse Brucella spp. among various hosts in SA. This study expands the limited known knowledge regarding the presence of B. melitensis in livestock and humans in SA, further building a foundation for future research on the distribution of the Brucella spp. worldwide and its evolutionary background., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
13. Whole Genome Sequence Analysis of Brucella abortus Isolates from Various Regions of South Africa.
- Author
-
Ledwaba MB, Glover BA, Matle I, Profiti G, Martelli PL, Casadio R, Zilli K, Janowicz A, Marotta F, Garofolo G, and van Heerden H
- Abstract
The availability of whole genome sequences in public databases permits genome-wide comparative studies of various bacterial species. Whole genome sequence-single nucleotide polymorphisms (WGS-SNP) analysis has been used in recent studies and allows the discrimination of various Brucella species and strains. In the present study, 13 Brucella spp. strains from cattle of various locations in provinces of South Africa were typed and discriminated. WGS-SNP analysis indicated a maximum pairwise distance ranging from 4 to 77 single nucleotide polymorphisms (SNPs) between the South African Brucella abortus virulent field strains. Moreover, it was shown that the South African B. abortus strains grouped closely to B. abortus strains from Mozambique and Zimbabwe, as well as other Eurasian countries, such as Portugal and India. WGS-SNP analysis of South African B. abortus strains demonstrated that the same genotype circulated in one farm (Farm 1), whereas another farm (Farm 2) in the same province had two different genotypes. This indicated that brucellosis in South Africa spreads within the herd on some farms, whereas the introduction of infected animals is the mode of transmission on other farms. Three B. abortus vaccine S19 strains isolated from tissue and aborted material were identical, even though they originated from different herds and regions of South Africa. This might be due to the incorrect vaccination of animals older than the recommended age of 4-8 months or might be a problem associated with vaccine production.
- Published
- 2021
- Full Text
- View/download PDF
14. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.
- Author
-
Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, Lewis KA, Georghiou G, Nguyen HN, Hamid MN, Davis L, Dogan T, Atalay V, Rifaioglu AS, Dalkıran A, Cetin Atalay R, Zhang C, Hurto RL, Freddolino PL, Zhang Y, Bhat P, Supek F, Fernández JM, Gemovic B, Perovic VR, Davidović RS, Sumonja N, Veljkovic N, Asgari E, Mofrad MRK, Profiti G, Savojardo C, Martelli PL, Casadio R, Boecker F, Schoof H, Kahanda I, Thurlby N, McHardy AC, Renaux A, Saidi R, Gough J, Freitas AA, Antczak M, Fabris F, Wass MN, Hou J, Cheng J, Wang Z, Romero AE, Paccanaro A, Yang H, Goldberg T, Zhao C, Holm L, Törönen P, Medlar AJ, Zosa E, Borukhov I, Novikov I, Wilkins A, Lichtarge O, Chi PH, Tseng WC, Linial M, Rose PW, Dessimoz C, Vidulin V, Dzeroski S, Sillitoe I, Das S, Lees JG, Jones DT, Wan C, Cozzetto D, Fa R, Torres M, Warwick Vesztrocy A, Rodriguez JM, Tress ML, Frasca M, Notaro M, Grossi G, Petrini A, Re M, Valentini G, Mesiti M, Roche DB, Reeb J, Ritchie DW, Aridhi S, Alborzi SZ, Devignes MD, Koo DCE, Bonneau R, Gligorijević V, Barot M, Fang H, Toppo S, Lavezzo E, Falda M, Berselli M, Tosatto SCE, Carraro M, Piovesan D, Ur Rehman H, Mao Q, Zhang S, Vucetic S, Black GS, Jo D, Suh E, Dayton JB, Larsen DJ, Omdahl AR, McGuffin LJ, Brackenridge DA, Babbitt PC, Yunes JM, Fontana P, Zhang F, Zhu S, You R, Zhang Z, Dai S, Yao S, Tian W, Cao R, Chandler C, Amezola M, Johnson D, Chang JM, Liao WH, Liu YW, Pascarelli S, Frank Y, Hoehndorf R, Kulmanov M, Boudellioua I, Politano G, Di Carlo S, Benso A, Hakala K, Ginter F, Mehryary F, Kaewphan S, Björne J, Moen H, Tolvanen MEE, Salakoski T, Kihara D, Jain A, Šmuc T, Altenhoff A, Ben-Hur A, Rost B, Brenner SE, Orengo CA, Jeffery CJ, Bosco G, Hogan DA, Martin MJ, O'Donovan C, Mooney SD, Greene CS, Radivojac P, and Friedberg I
- Subjects
- Animals, Biofilms, Candida albicans genetics, Drosophila melanogaster genetics, Genome, Bacterial, Genome, Fungal, Humans, Locomotion, Memory, Long-Term, Molecular Sequence Annotation methods, Pseudomonas aeruginosa genetics, Molecular Sequence Annotation trends
- Abstract
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function., Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory., Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
- Published
- 2019
- Full Text
- View/download PDF
15. Fido-SNP: the first webserver for scoring the impact of single nucleotide variants in the dog genome.
- Author
-
Capriotti E, Montanucci L, Profiti G, Rossi I, Giannuzzi D, Aresu L, and Fariselli P
- Subjects
- Algorithms, Animals, Dogs, Genetic Variation, Genome-Wide Association Study, Genotype, Internet, Genome genetics, Genomics, Polymorphism, Single Nucleotide genetics, Software
- Abstract
As the amount of genomic variation data increases, tools that are able to score the functional impact of single nucleotide variants become more and more necessary. While there are several prediction servers available for interpreting the effects of variants in the human genome, only few have been developed for other species, and none were specifically designed for species of veterinary interest such as the dog. Here, we present Fido-SNP the first predictor able to discriminate between Pathogenic and Benign single-nucleotide variants in the dog genome. Fido-SNP is a binary classifier based on the Gradient Boosting algorithm. It is able to classify and score the impact of variants in both coding and non-coding regions based on sequence features within seconds. When validated on a previously unseen set of annotated variants from the OMIA database, Fido-SNP reaches 88% overall accuracy, 0.77 Matthews correlation coefficient and 0.91 Area Under the ROC Curve., (© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2019
- Full Text
- View/download PDF
16. BUSCA: an integrative web server to predict subcellular localization of proteins.
- Author
-
Savojardo C, Martelli PL, Fariselli P, Profiti G, and Casadio R
- Subjects
- Bacteria chemistry, Bacteria ultrastructure, Benchmarking, Cell Membrane chemistry, Cell Membrane ultrastructure, Cell Nucleus chemistry, Cell Nucleus ultrastructure, Chloroplasts chemistry, Chloroplasts ultrastructure, Eukaryota chemistry, Eukaryota ultrastructure, Eukaryotic Cells ultrastructure, Gene Expression, Gene Ontology, Internet, Membrane Proteins metabolism, Mitochondria chemistry, Mitochondria ultrastructure, Mitochondrial Proteins metabolism, Molecular Sequence Annotation, Prokaryotic Cells ultrastructure, Protein Sorting Signals genetics, Eukaryotic Cells chemistry, Membrane Proteins genetics, Mitochondrial Proteins genetics, Prokaryotic Cells chemistry, Software
- Abstract
Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.
- Published
- 2018
- Full Text
- View/download PDF
17. eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes.
- Author
-
Babbi G, Martelli PL, Profiti G, Bovo S, Savojardo C, and Casadio R
- Subjects
- Genetic Diseases, Inborn metabolism, Humans, Metabolic Networks and Pathways, Molecular Sequence Annotation, Databases, Genetic, Genetic Diseases, Inborn genetics, Genomics methods, Protein Interaction Maps
- Abstract
Background: Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis., Results: We present eDGAR, a database collecting and organizing the data on gene/disease associations as derived from OMIM, Humsavar and ClinVar. For each disease-associated gene, eDGAR collects information on its annotation. Specifically, for lists of genes, eDGAR provides information on: i) interactions retrieved from PDB, BIOGRID and STRING; ii) co-occurrence in stable and functional structural complexes; iii) shared Gene Ontology annotations; iv) shared KEGG and REACTOME pathways; v) enriched functional annotations computed with NET-GE; vi) regulatory interactions derived from TRRUST; vii) localization on chromosomes and/or co-localisation in neighboring loci. The present release of eDGAR includes 2672 diseases, related to 3658 different genes, for a total number of 5729 gene-disease associations. 71% of the genes are linked to 621 multigenic diseases and eDGAR highlights their common GO terms, KEGG/REACTOME pathways, physical and regulatory interactions. eDGAR includes a network based enrichment method for detecting statistically significant functional terms associated to groups of genes., Conclusions: eDGAR offers a resource to analyze disease-gene associations. In multigenic diseases genes can share physical interactions and/or co-occurrence in the same functional processes. eDGAR is freely available at: edgar.biocomp.unibo.it.
- Published
- 2017
- Full Text
- View/download PDF
18. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.
- Author
-
Profiti G, Martelli PL, and Casadio R
- Subjects
- Cluster Analysis, Internet, Proteins chemistry, Proteins physiology, Molecular Sequence Annotation, Sequence Analysis, Protein, Software
- Abstract
BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3., (© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2017
- Full Text
- View/download PDF
19. An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
- Author
-
Jiang Y, Oron TR, Clark WT, Bankapur AR, D'Andrea D, Lepore R, Funk CS, Kahanda I, Verspoor KM, Ben-Hur A, Koo da CE, Penfold-Brown D, Shasha D, Youngs N, Bonneau R, Lin A, Sahraeian SM, Martelli PL, Profiti G, Casadio R, Cao R, Zhong Z, Cheng J, Altenhoff A, Skunca N, Dessimoz C, Dogan T, Hakala K, Kaewphan S, Mehryary F, Salakoski T, Ginter F, Fang H, Smithers B, Oates M, Gough J, Törönen P, Koskinen P, Holm L, Chen CT, Hsu WL, Bryson K, Cozzetto D, Minneci F, Jones DT, Chapman S, Bkc D, Khan IK, Kihara D, Ofer D, Rappoport N, Stern A, Cibrian-Uhalte E, Denny P, Foulger RE, Hieta R, Legge D, Lovering RC, Magrane M, Melidoni AN, Mutowo-Meullenet P, Pichler K, Shypitsyna A, Li B, Zakeri P, ElShal S, Tranchevent LC, Das S, Dawson NL, Lee D, Lees JG, Sillitoe I, Bhat P, Nepusz T, Romero AE, Sasidharan R, Yang H, Paccanaro A, Gillis J, Sedeño-Cortés AE, Pavlidis P, Feng S, Cejuela JM, Goldberg T, Hamp T, Richter L, Salamov A, Gabaldon T, Marcet-Houben M, Supek F, Gong Q, Ning W, Zhou Y, Tian W, Falda M, Fontana P, Lavezzo E, Toppo S, Ferrari C, Giollo M, Piovesan D, Tosatto SC, Del Pozo A, Fernández JM, Maietta P, Valencia A, Tress ML, Benso A, Di Carlo S, Politano G, Savino A, Rehman HU, Re M, Mesiti M, Valentini G, Bargsten JW, van Dijk AD, Gemovic B, Glisic S, Perovic V, Veljkovic V, Veljkovic N, Almeida-E-Silva DC, Vencio RZ, Sharan M, Vogel J, Kansakar L, Zhang S, Vucetic S, Wang Z, Sternberg MJ, Wass MN, Huntley RP, Martin MJ, O'Donovan C, Robinson PN, Moreau Y, Tramontano A, Babbitt PC, Brenner SE, Linial M, Orengo CA, Rost B, Greene CS, Mooney SD, Friedberg I, and Radivojac P
- Subjects
- Algorithms, Databases, Protein, Gene Ontology, Humans, Molecular Sequence Annotation, Proteins genetics, Computational Biology, Proteins chemistry, Software, Structure-Activity Relationship
- Abstract
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging., Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2., Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Published
- 2016
- Full Text
- View/download PDF
20. Ancient pathogen-driven adaptation triggers increased susceptibility to non-celiac wheat sensitivity in present-day European populations.
- Author
-
Sazzini M, De Fanti S, Cherubini A, Quagliariello A, Profiti G, Martelli PL, Casadio R, Ricci C, Campieri M, Lanzini A, Volta U, Caio G, Franceschi C, Spisni E, and Luiselli D
- Abstract
Background: Non-celiac wheat sensitivity is an emerging wheat-related syndrome showing peak prevalence in Western populations. Recent studies hypothesize that new gliadin alleles introduced in the human diet by replacement of ancient wheat with modern varieties can prompt immune responses mediated by the CXCR3-chemokine axis potentially underlying such pathogenic inflammation. This cultural shift may also explain disease epidemiology, having turned European-specific adaptive alleles previously targeted by natural selection into disadvantageous ones., Methods: To explore this evolutionary scenario, we performed ultra-deep sequencing of genes pivotal in the CXCR3-inflammatory pathway on individuals diagnosed for non-celiac wheat sensitivity and we applied anthropological evolutionary genetics methods to sequence data from worldwide populations to investigate the genetic legacy of natural selection on these loci., Results: Our results indicate that balancing selection has maintained two divergent CXCL10/CXCL11 haplotypes in Europeans, one responsible for boosting inflammatory reactions and another for encoding moderate chemokine expression., Conclusions: This led to considerably higher occurrence of the former haplotype in Western people than in Africans and East Asians, suggesting that they might be more prone to side effects related to the consumption of modern wheat varieties. Accordingly, this study contributed to shed new light on some of the mechanisms potentially involved in the disease etiology and on the evolutionary bases of its present-day epidemiological patterns. Moreover, overrepresentation of disease homozygotes for the dis-adaptive haplotype plausibly accounts for their even more enhanced CXCR3-axis expression and for their further increase in disease risk, representing a promising finding to be validated by larger follow-up studies.
- Published
- 2016
- Full Text
- View/download PDF
21. Tools and data services registry: a community effort to document bioinformatics resources.
- Author
-
Ison J, Rapacki K, Ménager H, Kalaš M, Rydza E, Chmura P, Anthon C, Beard N, Berka K, Bolser D, Booth T, Bretaudeau A, Brezovsky J, Casadio R, Cesareni G, Coppens F, Cornell M, Cuccuru G, Davidsen K, Vedova GD, Dogan T, Doppelt-Azeroual O, Emery L, Gasteiger E, Gatter T, Goldberg T, Grosjean M, Grüning B, Helmer-Citterich M, Ienasescu H, Ioannidis V, Jespersen MC, Jimenez R, Juty N, Juvan P, Koch M, Laibe C, Li JW, Licata L, Mareuil F, Mičetić I, Friborg RM, Moretti S, Morris C, Möller S, Nenadic A, Peterson H, Profiti G, Rice P, Romano P, Roncaglia P, Saidi R, Schafferhans A, Schwämmle V, Smith C, Sperotto MM, Stockinger H, Vařeková RS, Tosatto SC, de la Torre V, Uva P, Via A, Yachdav G, Zambelli F, Vriend G, Rost B, Parkinson H, Løngreen P, and Brunak S
- Subjects
- Data Curation, Software, Computational Biology, Registries
- Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools., (© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2016
- Full Text
- View/download PDF
22. AlignBucket: a tool to speed up 'all-against-all' protein sequence alignments optimizing length constraints.
- Author
-
Profiti G, Fariselli P, and Casadio R
- Subjects
- Humans, Algorithms, Computational Biology methods, Databases, Protein, Proteins chemistry, Sequence Alignment methods, Software
- Abstract
Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison., Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases., Availability and Implementation: The software is available for downloading at http://www.biocomp.unibo.it/∼giuseppe/partitioning.html., Contact: giuseppe.profiti2@unibo.it., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2015
- Full Text
- View/download PDF
23. How to inherit statistically validated annotation within BAR+ protein clusters.
- Author
-
Piovesan D, Martelli PL, Fariselli P, Profiti G, Zauli A, Rossi I, and Casadio R
- Subjects
- Cluster Analysis, Data Interpretation, Statistical, Databases, Protein, Genomics, Proteins genetics, Proteins physiology, Vocabulary, Controlled, Molecular Sequence Annotation, Sequence Analysis, Protein
- Abstract
Background: In the genomic era a key issue is protein annotation, namely how to endow protein sequences, upon translation from the corresponding genes, with structural and functional features. Routinely this operation is electronically done by deriving and integrating information from previous knowledge. The reference database for protein sequences is UniProtKB divided into two sections, UniProtKB/TrEMBL which is automatically annotated and not reviewed and UniProtKB/Swiss-Prot which is manually annotated and reviewed. The annotation process is essentially based on sequence similarity search. The question therefore arises as to which extent annotation based on transfer by inheritance is valuable and specifically if it is possible to statistically validate inherited features when little homology exists among the target sequence and its template(s)., Results: In this paper we address the problem of annotating protein sequences in a statistically validated manner considering as a reference annotation resource UniProtKB. The test case is the set of 48,298 proteins recently released by the Critical Assessment of Function Annotations (CAFA) organization. We show that we can transfer after validation, Gene Ontology (GO) terms of the three main categories and Pfam domains to about 68% and 72% of the sequences, respectively. This is possible after alignment of the CAFA sequences towards BAR+, our annotation resource that allows discriminating among statistically validated and not statistically validated annotation. By comparing with a direct UniProtKB annotation, we find that besides validating annotation of some 78% of the CAFA set, we assign new and statistically validated annotation to 14.8% of the sequences and find new structural templates for about 25% of the chains, half of which share less than 30% sequence identity to the corresponding template/s., Conclusion: Inheritance of annotation by transfer generally requires a careful selection of the identity value among the target and the template in order to transfer structural and/or functional features. Here we prove that even distantly remote homologs can be safely endowed with structural templates and GO and/or Pfam terms provided that annotation is done within clusters collecting cluster-related protein sequences and where a statistical validation of the shared structural and functional features is possible.
- Published
- 2013
- Full Text
- View/download PDF
24. The human "magnesome": detecting magnesium binding sites on human proteins.
- Author
-
Piovesan D, Profiti G, Martelli PL, and Casadio R
- Subjects
- Binding Sites, Humans, Models, Molecular, Molecular Sequence Annotation, Nuclear Receptor Subfamily 4, Group A, Member 2, Proteins metabolism, Cluster Analysis, Magnesium, Proteins chemistry, Proteome analysis
- Abstract
Background: Magnesium research is increasing in molecular medicine due to the relevance of this ion in several important biological processes and associated molecular pathogeneses. It is still difficult to predict from the protein covalent structure whether a human chain is or not involved in magnesium binding. This is mainly due to little information on the structural characteristics of magnesium binding sites in proteins and protein complexes. Magnesium binding features, differently from those of other divalent cations such as calcium and zinc, are elusive. Here we address a question that is relevant in protein annotation: how many human proteins can bind Mg2+? Our analysis is performed taking advantage of the recently implemented Bologna Annotation Resource (BAR-PLUS), a non hierarchical clustering method that relies on the pair wise sequence comparison of about 14 millions proteins from over 300.000 species and their grouping into clusters where annotation can safely be inherited after statistical validation., Results: After cluster assignment of the latest version of the human proteome, the total number of human proteins for which we can assign putative Mg binding sites is 3,751. Among these proteins, 2,688 inherit annotation directly from human templates and 1,063 inherit annotation from templates of other organisms. Protein structures are highly conserved inside a given cluster. Transfer of structural properties is possible after alignment of a given sequence with the protein structures that characterise a given cluster as obtained with a Hidden Markov Model (HMM) based procedure. Interestingly a set of 370 human sequences inherit Mg2+ binding sites from templates sharing less than 30% sequence identity with the template., Conclusion: We describe and deliver the "human magnesome", a set of proteins of the human proteome that inherit putative binding of magnesium ions. With our BAR-hMG, 251 clusters including 1,341 magnesium binding protein structures corresponding to 387 sequences are sufficient to annotate some 13,689 residues in 3,751 human sequences as "magnesium binding". Protein structures act therefore as three dimensional seeds for structural and functional annotation of human sequences. The data base collects specifically all the human proteins that can be annotated according to our procedure as "magnesium binding", the corresponding structures and BAR+ clusters from where they derive the annotation (http://bar.biocomp.unibo.it/mg).
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.