101 results on '"Casadio, R."'
Search Results
2. Computational Resources for Molecular Biology 2023.
- Author
-
Mathews DH, Casadio R, and Sternberg MJE
- Subjects
- Molecular Biology, Computational Biology
- Published
- 2023
- Full Text
- View/download PDF
3. Computational Resources for Molecular Biology 2021.
- Author
-
Casadio R, Lenhard B, and Sternberg MJE
- Subjects
- Databases, Genetic, Genome, Human, Humans, Software, Computational Biology, Molecular Biology
- Published
- 2021
- Full Text
- View/download PDF
4. Huntingtin: A Protein with a Peculiar Solvent Accessible Surface.
- Author
-
Babbi G, Savojardo C, Martelli PL, and Casadio R
- Subjects
- Binding Sites genetics, Humans, Huntingtin Protein genetics, Huntingtin Protein ultrastructure, Hydrophobic and Hydrophilic Interactions, Models, Molecular, Protein Binding genetics, Solvents chemistry, Surface Properties, Calcium metabolism, Computational Biology, Huntingtin Protein chemistry, Proteins genetics
- Abstract
Taking advantage of the last cryogenic electron microscopy structure of human huntingtin, we explored with computational methods its physicochemical properties, focusing on the solvent accessible surface of the protein and highlighting a quite interesting mix of hydrophobic and hydrophilic patterns, with the prevalence of the latter ones. We then evaluated the probability of exposed residues to be in contact with other proteins, discovering that they tend to cluster in specific regions of the protein. We then found that the remaining portions of the protein surface can contain calcium-binding sites that we propose here as putative mediators for the protein to interact with membranes. Our findings are justified in relation to the present knowledge of huntingtin functional annotation.
- Published
- 2021
- Full Text
- View/download PDF
5. Computer-Aided Prediction of Protein Mitochondrial Localization.
- Author
-
Martelli PL, Savojardo C, Fariselli P, Tartari G, and Casadio R
- Subjects
- Deep Learning, Humans, Mitochondrial Proteins chemistry, Protein Transport, Web Browser, Computational Biology methods, Mitochondria metabolism, Mitochondrial Proteins metabolism
- Abstract
Protein sequences, directly translated from genomic data, need functional and structural annotation. Together with molecular function and biological process, subcellular localization is an important feature necessary for understanding the protein role and the compartment where the mature protein is active. In the case of mitochondrial proteins, their precursor sequences translated by the ribosome machinery include specific patterns from which it is possible not only to recognize their final destination within the organelle but also which of the mitochondrial subcompartments the protein is intended for. Four compartments are routinely discriminated, including the inner and the outer membranes, the intermembrane space, and the matrix. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequence and to discriminate their final destination in the organelle. We benchmark two of our methods on the general task of recognizing human mitochondrial proteins endowed with an experimentally characterized targeting peptide (TPpred3) and predicting which submitochondrial compartment is the final destination (DeepMito). We describe how to adopt our web servers in order to discriminate which human proteins are endowed with mitochondrial targeting peptides, the position of cleavage sites, and which submitochondrial compartment are intended for. By this, we add some other 1788 human proteins to the 450 ones already manually annotated in UniProt with a mitochondrial targeting peptide, providing for each of them also the characterization of the suborganellar localization.
- Published
- 2021
- Full Text
- View/download PDF
6. Large-scale prediction and analysis of protein sub-mitochondrial localization with DeepMito.
- Author
-
Savojardo C, Martelli PL, Tartari G, and Casadio R
- Subjects
- Animals, Humans, Computational Biology methods, Mitochondrial Proteins genetics, Protein Transport genetics
- Abstract
Background: The prediction of protein subcellular localization is a key step of the big effort towards protein functional annotation. Many computational methods exist to identify high-level protein subcellular compartments such as nucleus, cytoplasm or organelles. However, many organelles, like mitochondria, have their own internal compartmentalization. Knowing the precise location of a protein inside mitochondria is crucial for its accurate functional characterization. We recently developed DeepMito, a new method based on a 1-Dimensional Convolutional Neural Network (1D-CNN) architecture outperforming other similar approaches available in literature., Results: Here, we explore the adoption of DeepMito for the large-scale annotation of four sub-mitochondrial localizations on mitochondrial proteomes of five different species, including human, mouse, fly, yeast and Arabidopsis thaliana. A significant fraction of the proteins from these organisms lacked experimental information about sub-mitochondrial localization. We adopted DeepMito to fill the gap, providing complete characterization of protein localization at sub-mitochondrial level for each protein of the five proteomes. Moreover, we identified novel mitochondrial proteins fishing on the set of proteins lacking any subcellular localization annotation using available state-of-the-art subcellular localization predictors. We finally performed additional functional characterization of proteins predicted by DeepMito as localized into the four different sub-mitochondrial compartments using both available experimental and predicted GO terms. All data generated in this study were collected into a database called DeepMitoDB (available at http://busca.biocomp.unibo.it/deepmitodb ), providing complete functional characterization of 4307 mitochondrial proteins from the five species., Conclusions: DeepMitoDB offers a comprehensive view of mitochondrial proteins, including experimental and predicted fine-grain sub-cellular localization and annotated and predicted functional annotations. The database complements other similar resources providing characterization of new proteins. Furthermore, it is also unique in including localization information at the sub-mitochondrial level. For this reason, we believe that DeepMitoDB can be a valuable resource for mitochondrial research.
- Published
- 2020
- Full Text
- View/download PDF
7. DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks.
- Author
-
Savojardo C, Bruciaferri N, Tartari G, Martelli PL, and Casadio R
- Subjects
- Humans, Protein Transport, Computational Biology methods, Mitochondrial Proteins genetics, Mitochondrial Proteins metabolism, Neural Networks, Computer, Software
- Abstract
Motivation: The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments., Results: We describe DeepMito, a novel method for predicting protein sub-mitochondrial cellular localization. Taking advantage of powerful deep-learning approaches, such as convolutional neural networks, our method is able to achieve very high prediction performances when discriminating among four different mitochondrial compartments (matrix, outer, inner and intermembrane regions). The method is trained and tested in cross-validation on a newly generated, high-quality dataset comprising 424 mitochondrial proteins with experimental evidence for sub-organelle localizations. We benchmark DeepMito towards the only one recent approach developed for the same task. Results indicate that DeepMito performances are superior. Finally, genomic-scale prediction on a highly-curated dataset of human mitochondrial proteins further confirms the effectiveness of our approach and suggests that DeepMito is a good candidate for genome-scale annotation of mitochondrial protein subcellular localization., Availability and Implementation: The DeepMito web server as well as all datasets used in this study are available at http://busca.biocomp.unibo.it/deepmito. A standalone version of DeepMito is available on DockerHub at https://hub.docker.com/r/bolognabiocomp/deepmito. DeepMito source code is available on GitHub at https://github.com/BolognaBiocomp/deepmito., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2019. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
8. Assessing predictions on fitness effects of missense variants in calmodulin.
- Author
-
Zhang J, Kinch LN, Cong Q, Katsonis P, Lichtarge O, Savojardo C, Babbi G, Martelli PL, Capriotti E, Casadio R, Garg A, Pal D, Weile J, Sun S, Verby M, Roth FP, and Grishin NV
- Subjects
- Algorithms, Binding Sites, Calcium metabolism, Calmodulin metabolism, Evolution, Molecular, Fungal Proteins chemistry, Fungal Proteins genetics, Fungal Proteins metabolism, Genetic Fitness, Humans, Models, Genetic, Models, Molecular, Protein Conformation, Protein Engineering, Yeasts genetics, Calmodulin chemistry, Calmodulin genetics, Computational Biology methods, Mutation, Missense, Yeasts growth & development
- Abstract
This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
9. Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges.
- Author
-
Savojardo C, Babbi G, Bovo S, Capriotti E, Martelli PL, and Casadio R
- Subjects
- Algorithms, Computer Simulation, Databases, Genetic, Genetic Predisposition to Disease, Humans, Machine Learning, Phenotype, Protein Stability, Computational Biology methods, Genetic Variation, Proteins chemistry, Proteins genetics
- Abstract
In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
10. Assessing predictions of the impact of variants on splicing in CAGI5.
- Author
-
Mount SM, Avsec Ž, Carmel L, Casadio R, Çelik MH, Chen K, Cheng J, Cohen NE, Fairbrother WG, Fenesh T, Gagneur J, Gotea V, Holzer T, Lin CF, Martelli PL, Naito T, Nguyen TYD, Savojardo C, Unger R, Wang R, Yang Y, and Zhao H
- Subjects
- Animals, Congresses as Topic, Genetic Fitness, Humans, Models, Genetic, Sequence Homology, Nucleic Acid, Alternative Splicing, Computational Biology methods, Mutation, Proteins genetics
- Abstract
Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
11. Performance of computational methods for the evaluation of pericentriolar material 1 missense variants in CAGI-5.
- Author
-
Monzon AM, Carraro M, Chiricosta L, Reggiani F, Han J, Ozturk K, Wang Y, Miller M, Bromberg Y, Capriotti E, Savojardo C, Babbi G, Martelli PL, Casadio R, Katsonis P, Lichtarge O, Carter H, Kousi M, Katsanis N, Andreoletti G, Moult J, Brenner SE, Ferrari C, Leonardi E, and Tosatto SCE
- Subjects
- Databases, Genetic, Genetic Predisposition to Disease, Humans, Neural Networks, Computer, Phenotype, Polymorphism, Single Nucleotide, Autoantigens genetics, Cell Cycle Proteins genetics, Computational Biology methods, Mutation, Missense, Schizophrenia genetics
- Abstract
The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
12. Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016.
- Author
-
Clark WT, Kasak L, Bakolitsa C, Hu Z, Andreoletti G, Babbi G, Bromberg Y, Casadio R, Dunbrack R, Folkman L, Ford CT, Jones D, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Nodzak C, Pal LR, Radivojac P, Savojardo C, Shi X, Zhou Y, Uppal A, Xu Q, Yin Y, Pejaver V, Wang M, Wei L, Moult J, Yu GK, Brenner SE, and LeBowitz JH
- Subjects
- Acetylglucosaminidase genetics, Humans, Models, Genetic, Regression Analysis, Acetylglucosaminidase metabolism, Computational Biology methods, Mutation, Missense
- Abstract
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
13. CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases.
- Author
-
Kasak L, Hunter JM, Udani R, Bakolitsa C, Hu Z, Adhikari AN, Babbi G, Casadio R, Gough J, Guerrero RF, Jiang Y, Joseph T, Katsonis P, Kotte S, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Moult J, Pal LR, Poitras J, Radivojac P, Rao A, Sivadasan N, Sunderam U, Saipradeep VG, Yin Y, Zaucha J, Brenner SE, and Meyn MS
- Subjects
- Adolescent, Child, Child, Preschool, Computer Simulation, Databases, Genetic, Female, Genetic Predisposition to Disease, Humans, Male, Phenotype, Undiagnosed Diseases genetics, Whole Genome Sequencing, Computational Biology methods, Genetic Variation, Undiagnosed Diseases diagnosis
- Abstract
Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes., (© 2019 The Authors. Human Mutation published by Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
14. Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants.
- Author
-
Cline MS, Babbi G, Bonache S, Cao Y, Casadio R, de la Cruz X, Díez O, Gutiérrez-Enríquez S, Katsonis P, Lai C, Lichtarge O, Martelli PL, Mishne G, Moles-Fernández A, Montalban G, Mooney SD, O'Conner R, Ootes L, Özkan S, Padilla N, Pagel KA, Pejaver V, Radivojac P, Riera C, Savojardo C, Shen Y, Sun Y, Topper S, Parsons MT, Spurdle AB, and Goldgar DE
- Subjects
- Breast Neoplasms genetics, Early Detection of Cancer, Female, Genetic Predisposition to Disease, Genetic Testing, Genetic Variation, Humans, Models, Genetic, Ovarian Neoplasms genetics, BRCA1 Protein genetics, BRCA2 Protein genetics, Breast Neoplasms diagnosis, Computational Biology methods, Ovarian Neoplasms diagnosis
- Abstract
Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
15. Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants.
- Author
-
Kasak L, Bakolitsa C, Hu Z, Yu C, Rine J, Dimster-Denk DF, Pandey G, De Baets G, Bromberg Y, Cao C, Capriotti E, Casadio R, Van Durme J, Giollo M, Karchin R, Katsonis P, Leonardi E, Lichtarge O, Martelli PL, Masica D, Mooney SD, Olatubosun A, Radivojac P, Rousseau F, Pal LR, Savojardo C, Schymkowitz J, Thusberg J, Tosatto SCE, Vihinen M, Väliaho J, Repo S, Moult J, Brenner SE, and Friedberg I
- Subjects
- Cystathionine metabolism, Cystathionine beta-Synthase metabolism, Homocysteine metabolism, Humans, Phenotype, Precision Medicine, Amino Acid Substitution, Computational Biology methods, Cystathionine beta-Synthase genetics
- Abstract
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
16. Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer.
- Author
-
Voskanian A, Katsonis P, Lichtarge O, Pejaver V, Radivojac P, Mooney SD, Capriotti E, Bromberg Y, Wang Y, Miller M, Martelli PL, Savojardo C, Babbi G, Casadio R, Cao Y, Sun Y, Shen Y, Garg A, Pal D, Yu Y, Huff CD, Tavtigian SV, Young E, Neuhausen SL, Ziv E, Pal LR, Andreoletti G, Brenner SE, and Kann MG
- Subjects
- Adult, Aged, Breast Neoplasms ethnology, Case-Control Studies, Computer Simulation, Female, Genetic Predisposition to Disease, Humans, Linear Models, Middle Aged, United States ethnology, Exome Sequencing, Breast Neoplasms genetics, Checkpoint Kinase 2 genetics, Computational Biology methods, Hispanic or Latino genetics, Polymorphism, Single Nucleotide
- Abstract
The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
17. Assessment of methods for predicting the effects of PTEN and TPMT protein variants.
- Author
-
Pejaver V, Babbi G, Casadio R, Folkman L, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Miller M, Moult J, Pal LR, Savojardo C, Yin Y, Zhou Y, Radivojac P, and Bromberg Y
- Subjects
- High-Throughput Nucleotide Sequencing, Humans, Methyltransferases genetics, PTEN Phosphohydrolase genetics, Protein Stability, Computational Biology methods, Methyltransferases chemistry, Mutation, PTEN Phosphohydrolase chemistry
- Abstract
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact., (© 2019 Wiley Periodicals, Inc.)
- Published
- 2019
- Full Text
- View/download PDF
18. PhenPath: a tool for characterizing biological functions underlying different phenotypes.
- Author
-
Babbi G, Martelli PL, and Casadio R
- Subjects
- Disease genetics, Humans, Biological Ontologies, Computational Biology methods, Databases, Genetic, Phenotype
- Abstract
Background: Many diseases are associated with complex patterns of symptoms and phenotypic manifestations. Parsimonious explanations aim at reconciling the multiplicity of phenotypic traits with the perturbation of one or few biological functions. For this, it is necessary to characterize human phenotypes at the molecular and functional levels, by exploiting gene annotations and known relations among genes, diseases and phenotypes. This characterization makes it possible to implement tools for retrieving functions shared among phenotypes, co-occurring in the same patient and facilitating the formulation of hypotheses about the molecular causes of the disease., Results: We introduce PhenPath, a new resource consisting of two parts: PhenPathDB and PhenPathTOOL. The former is a database collecting the human genes associated with the phenotypes described in Human Phenotype Ontology (HPO) and OMIM Clinical Synopses. Phenotypes are then associated with biological functions and pathways by means of NET-GE, a network-based method for functional enrichment of sets of genes. The present version considers only phenotypes related to diseases. PhenPathDB collects information for 18 OMIM Clinical synopses and 7137 HPO phenotypes, related to 4292 diseases and 3446 genes. Enrichment of Gene Ontology annotations endows some 87.7, 86.9 and 73.6% of HPO phenotypes with Biological Process, Molecular Function and Cellular Component terms, respectively. Furthermore, 58.8 and 77.8% of HPO phenotypes are also enriched for KEGG and Reactome pathways, respectively. Based on PhenPathDB, PhenPathTOOL analyzes user-defined sets of phenotypes retrieving diseases, genes and functional terms which they share. This information can provide clues for interpreting the co-occurrence of phenotypes in a patient., Conclusions: The resource allows finding molecular features useful to investigate diseases characterized by multiple phenotypes, and by this, it can help researchers and physicians in identifying molecular mechanisms and biological functions underlying the concomitant manifestation of phenotypes. The resource is freely available at http://phenpath.biocomp.unibo.it .
- Published
- 2019
- Full Text
- View/download PDF
19. Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI.
- Author
-
Carraro M, Minervini G, Giollo M, Bromberg Y, Capriotti E, Casadio R, Dunbrack R, Elefanti L, Fariselli P, Ferrari C, Gough J, Katsonis P, Leonardi E, Lichtarge O, Menin C, Martelli PL, Niroula A, Pal LR, Repo S, Scaini MC, Vihinen M, Wei Q, Xu Q, Yang Y, Yin Y, Zaucha J, Zhao H, Zhou Y, Brenner SE, Moult J, and Tosatto SCE
- Subjects
- Cell Line, Tumor, Cell Proliferation, Computer Simulation, Cyclin-Dependent Kinase Inhibitor p16, Cyclin-Dependent Kinase Inhibitor p18 chemistry, Databases, Genetic, Genetic Predisposition to Disease, Humans, Machine Learning, Protein Stability, Computational Biology methods, Cyclin-Dependent Kinase Inhibitor p18 genetics, Genetic Variation
- Abstract
Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of 10 variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants., (© 2017 Wiley Periodicals, Inc.)
- Published
- 2017
- Full Text
- View/download PDF
20. Blind prediction of deleterious amino acid variations with SNPs&GO.
- Author
-
Capriotti E, Martelli PL, Fariselli P, and Casadio R
- Subjects
- Acid Anhydride Hydrolases, Algorithms, Gene Ontology, Genetic Predisposition to Disease, Humans, Molecular Sequence Annotation, ROC Curve, Support Vector Machine, Amino Acid Substitution, Checkpoint Kinase 2 genetics, Computational Biology methods, Cyclin-Dependent Kinase Inhibitor p16 genetics, DNA Repair Enzymes genetics, DNA-Binding Proteins genetics, alpha-N-Acetylgalactosaminidase genetics
- Abstract
SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a support vector machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by gene ontology (GO) terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve of 0.88 with low false-positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper, we summarize the best results obtained by SNPs&GO on disease-related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013), and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs., (© 2017 Wiley Periodicals, Inc.)
- Published
- 2017
- Full Text
- View/download PDF
21. ISPRED4: interaction sites PREDiction in protein structures with a refining grammar model.
- Author
-
Savojardo C, Fariselli P, Martelli PL, and Casadio R
- Subjects
- Sequence Analysis, Protein methods, Computational Biology methods, Machine Learning, Protein Conformation, Protein Interaction Domains and Motifs, Software
- Abstract
Motivation: The identification of protein-protein interaction (PPI) sites is an important step towards the characterization of protein functional integration in the cell complexity. Experimental methods are costly and time-consuming and computational tools for predicting PPI sites can fill the gaps of PPI present knowledge., Results: We present ISPRED4, an improved structure-based predictor of PPI sites on unbound monomer surfaces. ISPRED4 relies on machine-learning methods and it incorporates features extracted from protein sequence and structure. Cross-validation experiments are carried out on a new dataset that includes 151 high-resolution protein complexes and indicate that ISPRED4 achieves a per-residue Matthew Correlation Coefficient of 0.48 and an overall accuracy of 0.85. Benchmarking results show that ISPRED4 is one of the top-performing PPI site predictors developed so far., Contact: gigi@biocomp.unibo.it., Availability and Implementation: ISPRED4 and datasets used in this study are available at http://ispred4.biocomp.unibo.it ., (© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com)
- Published
- 2017
- Full Text
- View/download PDF
22. SChloro: directing Viridiplantae proteins to six chloroplastic sub-compartments.
- Author
-
Savojardo C, Martelli PL, Fariselli P, and Casadio R
- Subjects
- Membrane Proteins metabolism, Protein Transport, Chloroplasts metabolism, Computational Biology methods, Machine Learning, Plant Proteins metabolism, Sequence Analysis, Protein methods, Software, Viridiplantae metabolism
- Abstract
Motivation: Chloroplasts are organelles found in plants and involved in several important cell processes. Similarly to other compartments in the cell, chloroplasts have an internal structure comprising several sub-compartments, where different proteins are targeted to perform their functions. Given the relation between protein function and localization, the availability of effective computational tools to predict protein sub-organelle localizations is crucial for large-scale functional studies., Results: In this paper we present SChloro, a novel machine-learning approach to predict protein sub-chloroplastic localization, based on targeting signal detection and membrane protein information. The proposed approach performs multi-label predictions discriminating six chloroplastic sub-compartments that include inner membrane, outer membrane, stroma, thylakoid lumen, plastoglobule and thylakoid membrane. In comparative benchmarks, the proposed method outperforms current state-of-the-art methods in both single- and multi-compartment predictions, with an overall multi-label accuracy of 74%. The results demonstrate the relevance of the approach that is eligible as a good candidate for integration into more general large-scale annotation pipelines of protein subcellular localization., Availability and Implementation: The method is available as web server at http://schloro.biocomp.unibo.it, Contact: gigi@biocomp.unibo.it.
- Published
- 2017
- Full Text
- View/download PDF
23. An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
- Author
-
Jiang Y, Oron TR, Clark WT, Bankapur AR, D'Andrea D, Lepore R, Funk CS, Kahanda I, Verspoor KM, Ben-Hur A, Koo da CE, Penfold-Brown D, Shasha D, Youngs N, Bonneau R, Lin A, Sahraeian SM, Martelli PL, Profiti G, Casadio R, Cao R, Zhong Z, Cheng J, Altenhoff A, Skunca N, Dessimoz C, Dogan T, Hakala K, Kaewphan S, Mehryary F, Salakoski T, Ginter F, Fang H, Smithers B, Oates M, Gough J, Törönen P, Koskinen P, Holm L, Chen CT, Hsu WL, Bryson K, Cozzetto D, Minneci F, Jones DT, Chapman S, Bkc D, Khan IK, Kihara D, Ofer D, Rappoport N, Stern A, Cibrian-Uhalte E, Denny P, Foulger RE, Hieta R, Legge D, Lovering RC, Magrane M, Melidoni AN, Mutowo-Meullenet P, Pichler K, Shypitsyna A, Li B, Zakeri P, ElShal S, Tranchevent LC, Das S, Dawson NL, Lee D, Lees JG, Sillitoe I, Bhat P, Nepusz T, Romero AE, Sasidharan R, Yang H, Paccanaro A, Gillis J, Sedeño-Cortés AE, Pavlidis P, Feng S, Cejuela JM, Goldberg T, Hamp T, Richter L, Salamov A, Gabaldon T, Marcet-Houben M, Supek F, Gong Q, Ning W, Zhou Y, Tian W, Falda M, Fontana P, Lavezzo E, Toppo S, Ferrari C, Giollo M, Piovesan D, Tosatto SC, Del Pozo A, Fernández JM, Maietta P, Valencia A, Tress ML, Benso A, Di Carlo S, Politano G, Savino A, Rehman HU, Re M, Mesiti M, Valentini G, Bargsten JW, van Dijk AD, Gemovic B, Glisic S, Perovic V, Veljkovic V, Veljkovic N, Almeida-E-Silva DC, Vencio RZ, Sharan M, Vogel J, Kansakar L, Zhang S, Vucetic S, Wang Z, Sternberg MJ, Wass MN, Huntley RP, Martin MJ, O'Donovan C, Robinson PN, Moreau Y, Tramontano A, Babbitt PC, Brenner SE, Linial M, Orengo CA, Rost B, Greene CS, Mooney SD, Friedberg I, and Radivojac P
- Subjects
- Algorithms, Databases, Protein, Gene Ontology, Humans, Molecular Sequence Annotation, Proteins genetics, Computational Biology, Proteins chemistry, Software, Structure-Activity Relationship
- Abstract
Background: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging., Results: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2., Conclusions: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Published
- 2016
- Full Text
- View/download PDF
24. Tools and data services registry: a community effort to document bioinformatics resources.
- Author
-
Ison J, Rapacki K, Ménager H, Kalaš M, Rydza E, Chmura P, Anthon C, Beard N, Berka K, Bolser D, Booth T, Bretaudeau A, Brezovsky J, Casadio R, Cesareni G, Coppens F, Cornell M, Cuccuru G, Davidsen K, Vedova GD, Dogan T, Doppelt-Azeroual O, Emery L, Gasteiger E, Gatter T, Goldberg T, Grosjean M, Grüning B, Helmer-Citterich M, Ienasescu H, Ioannidis V, Jespersen MC, Jimenez R, Juty N, Juvan P, Koch M, Laibe C, Li JW, Licata L, Mareuil F, Mičetić I, Friborg RM, Moretti S, Morris C, Möller S, Nenadic A, Peterson H, Profiti G, Rice P, Romano P, Roncaglia P, Saidi R, Schafferhans A, Schwämmle V, Smith C, Sperotto MM, Stockinger H, Vařeková RS, Tosatto SC, de la Torre V, Uva P, Via A, Yachdav G, Zambelli F, Vriend G, Rost B, Parkinson H, Løngreen P, and Brunak S
- Subjects
- Data Curation, Software, Computational Biology, Registries
- Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools., (© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2016
- Full Text
- View/download PDF
25. AlignBucket: a tool to speed up 'all-against-all' protein sequence alignments optimizing length constraints.
- Author
-
Profiti G, Fariselli P, and Casadio R
- Subjects
- Humans, Algorithms, Computational Biology methods, Databases, Protein, Proteins chemistry, Sequence Alignment methods, Software
- Abstract
Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison., Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases., Availability and Implementation: The software is available for downloading at http://www.biocomp.unibo.it/∼giuseppe/partitioning.html., Contact: giuseppe.profiti2@unibo.it., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2015
- Full Text
- View/download PDF
26. INPS: predicting the impact of non-synonymous variations on protein stability from sequence.
- Author
-
Fariselli P, Martelli PL, Savojardo C, and Casadio R
- Subjects
- Algorithms, Humans, Machine Learning, Protein Engineering, Proteins genetics, Thermodynamics, Tumor Suppressor Protein p53 genetics, Computational Biology methods, High-Throughput Nucleotide Sequencing methods, Mutation genetics, Protein Stability, Proteins chemistry, Software, Tumor Suppressor Protein p53 chemistry
- Abstract
Motivation: A tool for reliably predicting the impact of variations on protein stability is extremely important for both protein engineering and for understanding the effects of Mendelian and somatic mutations in the genome. Next Generation Sequencing studies are constantly increasing the number of protein sequences. Given the huge disproportion between protein sequences and structures, there is a need for tools suited to annotate the effect of mutations starting from protein sequence without relying on the structure. Here, we describe INPS, a novel approach for annotating the effect of non-synonymous mutations on the protein stability from its sequence. INPS is based on SVM regression and it is trained to predict the thermodynamic free energy change upon single-point variations in protein sequences., Results: We show that INPS performs similarly to the state-of-the-art methods based on protein structure when tested in cross-validation on a non-redundant dataset. INPS performs very well also on a newly generated dataset consisting of a number of variations occurring in the tumor suppressor protein p53. Our results suggest that INPS is a tool suited for computing the effect of non-synonymous polymorphisms on protein stability when the protein structure is not available. We also show that INPS predictions are complementary to those of the state-of-the-art, structure-based method mCSM. When the two methods are combined, the overall prediction on the p53 set scores significantly higher than those of the single methods., Availability and Implementation: The presented method is available as web server at http://inps.biocomp.unibo.it., Contact: piero.fariselli@unibo.it, Supplementary Information: Supplementary Materials are available at Bioinformatics online., (© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2015
- Full Text
- View/download PDF
27. Computer-based prediction of mitochondria-targeting peptides.
- Author
-
Martelli PL, Savojardo C, Fariselli P, Tasco G, and Casadio R
- Subjects
- Datasets as Topic, Genomics methods, Humans, Internet, Protein Transport, Artificial Intelligence, Computational Biology methods, Mitochondria metabolism, Mitochondrial Proteins chemistry, Mitochondrial Proteins metabolism, Models, Biological, Peptides chemistry, Peptides metabolism
- Abstract
Computational methods are invaluable when protein sequences, directly derived from genomic data, need functional and structural annotation. Subcellular localization is a feature necessary for understanding the protein role and the compartment where the mature protein is active and very difficult to characterize experimentally. Mitochondrial proteins encoded on the cytosolic ribosomes carry specific patterns in the precursor sequence from where it is possible to recognize a peptide targeting the protein to its final destination. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequences and benchmark our and other methods on the human mitochondrial proteins endowed with experimentally characterized targeting peptides. Furthermore, we illustrate our newly implemented web server and its usage on the whole human proteome in order to infer mitochondrial targeting peptides, their cleavage sites, and whether the targeting peptide regions contain or not arginine-rich recurrent motifs. By this, we add some other 2,800 human proteins to the 124 ones already experimentally annotated with a mitochondrial targeting peptide.
- Published
- 2015
- Full Text
- View/download PDF
28. TPpred2: improving the prediction of mitochondrial targeting peptide cleavage sites by exploiting sequence motifs.
- Author
-
Savojardo C, Martelli PL, Fariselli P, and Casadio R
- Subjects
- Amino Acid Motifs, Artificial Intelligence, Binding Sites, Internet, Software, User-Computer Interface, Computational Biology methods, Mitochondrial Proteins chemistry, Mitochondrial Proteins metabolism, Protein Sorting Signals, Proteolysis
- Abstract
Summary: Targeting peptides are N-terminal sorting signals in proteins that promote their translocation to mitochondria through the interaction with different protein machineries. We recently developed TPpred, a machine learning-based method scoring among the best ones available to predict the presence of a targeting peptide into a protein sequence and its cleavage site. Here we introduce TPpred2 that improves TPpred performances in the task of identifying the cleavage site of the targeting peptides. TPpred2 is now available as a web interface and as a stand-alone version for users who can freely download and adopt it for processing large volumes of sequences. Availability and implementaion: TPpred2 is available both as web server and stand-alone version at http://tppred2.biocomp.unibo.it., Contact: gigi@biocomp.unibo.it, Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2014
- Full Text
- View/download PDF
29. A large-scale evaluation of computational protein function prediction.
- Author
-
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DW, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJ, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YA, van Dijk AD, ter Braak CJ, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, and Friedberg I
- Subjects
- Algorithms, Animals, Databases, Protein, Exoribonucleases classification, Exoribonucleases genetics, Exoribonucleases physiology, Forecasting, Humans, Proteins chemistry, Proteins classification, Proteins genetics, Species Specificity, Computational Biology methods, Molecular Biology methods, Molecular Sequence Annotation, Proteins physiology
- Abstract
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
- Published
- 2013
- Full Text
- View/download PDF
30. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation.
- Author
-
Capriotti E, Calabrese R, Fariselli P, Martelli PL, Altman RB, and Casadio R
- Subjects
- Humans, Internet, Proteins chemistry, Algorithms, Amino Acid Substitution genetics, Computational Biology methods, Genetic Variation, Molecular Sequence Annotation methods, Proteins genetics, Software
- Abstract
Background: SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases., Results: The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO(3d) programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively., Conclusions: WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go.
- Published
- 2013
- Full Text
- View/download PDF
31. Predicting cancer-associated germline variations in proteins.
- Author
-
Martelli PL, Fariselli P, Balzani E, and Casadio R
- Subjects
- Algorithms, Genetic Predisposition to Disease, Germ-Line Mutation genetics, Humans, Support Vector Machine, Computational Biology methods, Neoplasms genetics, Proteins genetics
- Abstract
Background: Various computational methods are presently available to classify whether a protein variation is disease-associated or not. However data derived from recent technological advancements make it feasible to extend the annotation of disease-associated variations in order to include specific phenotypes. Here we tackle the problem of distinguishing between genetic variations associated to cancer and variations associated to other genetic diseases., Results: We implement a new method based on Support Vector Machines that takes as input the protein variant and the protein function, as described by its associated Gene Ontology terms. Our approach succeeds in discriminating between germline variants that are likely to be cancer-associated from those that are related to other genetic disorders. The method performs with values of 90% accuracy and 0.61 Matthews correlation coefficient on a set comprising 6478 germline variations (16% are cancer-associated) in 592 proteins. The sensitivity and the specificity on the cancer class are 69% and 66%, respectively. Furthermore the method is capable of correctly excluding some 96% of 3392 somatic cancer-associated variations in 1983 proteins not included in the training/testing set., Conclusions: Here we prove feasible that a large set of cancer associated germline protein variations can be successfully discriminated from those associated to other genetic disorders. This is a step further in the process of protein variant annotation. Scoring largely improves when protein function as encoded by Gene Ontology terms is considered, corroborating the role of protein function as a key feature for a correct annotation of its variations.
- Published
- 2012
- Full Text
- View/download PDF
32. Is there an optimal substitution matrix for contact prediction with correlated mutations?
- Author
-
Di Lena P, Fariselli P, Margara L, Vassura M, and Casadio R
- Subjects
- Algorithms, Databases, Protein, Mutation, Computational Biology methods, Models, Statistical, Protein Interaction Domains and Motifs, Protein Interaction Mapping methods, Proteins chemistry
- Abstract
Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.
- Published
- 2011
- Full Text
- View/download PDF
33. MemLoci: predicting subcellular localization of membrane proteins in eukaryotes.
- Author
-
Pierleoni A, Martelli PL, and Casadio R
- Subjects
- Databases, Protein, Eukaryotic Cells chemistry, Organelles chemistry, Protein Sorting Signals, Protein Transport, Computational Biology methods, Membrane Proteins chemistry, Support Vector Machine
- Abstract
Motivation: Subcellular localization is a key feature in the process of functional annotation of both globular and membrane proteins. In the absence of experimental data, protein localization is inferred on the basis of annotation transfer upon sequence similarity search. However, predictive tools are necessary when the localization of homologs is not known. This is so particularly for membrane proteins. Furthermore, most of the available predictors of subcellular localization are specifically trained on globular proteins and poorly perform on membrane proteins., Results: Here we develop MemLoci, a new support vector machine-based tool that discriminates three membrane protein localizations: plasma, internal and organelle membrane. When tested on an independent set, MemLoci outperforms existing methods, reaching an overall accuracy of 70% on predicting the location in the three membrane types, with a generalized correlation coefficient as high as 0.50., Availability: The MemLoci server is freely available on the web at: http://mu2py.biocomp.unibo.it/memloci. Datasets described in the article can be downloaded at the same site.
- Published
- 2011
- Full Text
- View/download PDF
34. CCHMM_PROF: a HMM-based coiled-coil predictor with evolutionary information.
- Author
-
Bartoli L, Fariselli P, Krogh A, and Casadio R
- Subjects
- Databases, Protein, Protein Conformation, Protein Interaction Mapping, Structure-Activity Relationship, Computational Biology methods, Proteins chemistry, Software
- Abstract
Motivation: The widespread coiled-coil structural motif in proteins is known to mediate a variety of biological interactions. Recognizing a coiled-coil containing sequence and locating its coiled-coil domains are key steps towards the determination of the protein structure and function. Different tools are available for predicting coiled-coil domains in protein sequences, including those based on position-specific score matrices and machine learning methods., Results: In this article, we introduce a hidden Markov model (CCHMM_PROF) that exploits the information contained in multiple sequence alignments (profiles) to predict coiled-coil regions. The new method discriminates coiled-coil sequences with an accuracy of 97% and achieves a true positive rate of 79% with only 1% of false positives. Furthermore, when predicting the location of coiled-coil segments in protein sequences, the method reaches an accuracy of 80% at the residue level and a best per-segment and per-protein efficiency of 81% and 80%, respectively. The results indicate that CCHMM_PROF outperforms all the existing tools and can be adopted for large-scale genome annotation., Availability: The dataset is available at http://www.biocomp.unibo.it/ approximately lisa/coiled-coils. The predictor is freely available at http://gpcr.biocomp.unibo.it/cgi/predictors/cchmmprof/pred_cchmmprof.cgi., Contact: piero@biocomp.unibo.it.
- Published
- 2009
- Full Text
- View/download PDF
35. The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis.
- Author
-
Bartoli L, Montanucci L, Fronza R, Martelli PL, Fariselli P, Carota L, Donvito G, Maggi GP, and Casadio R
- Subjects
- Animals, Cluster Analysis, Databases, Genetic, Pongo pygmaeus genetics, Protein Interaction Mapping, Proteins genetics, Reproducibility of Results, Sequence Alignment, Terminology as Topic, Computational Biology methods, Genomics methods, Proteins analysis, Sequence Analysis, Protein methods
- Abstract
Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).
- Published
- 2009
- Full Text
- View/download PDF
36. Sequence-based feature prediction and annotation of proteins.
- Author
-
Juncker AS, Jensen LJ, Pierleoni A, Bernsel A, Tress ML, Bork P, von Heijne G, Valencia A, Ouzounis CA, Casadio R, and Brunak S
- Subjects
- Amino Acid Motifs, Binding Sites, Humans, Protein Kinases metabolism, Proteins metabolism, Proteome, Computational Biology methods, Proteins physiology
- Abstract
A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome.
- Published
- 2009
- Full Text
- View/download PDF
37. Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.
- Author
-
Capriotti E, Arbiza L, Casadio R, Dopazo J, Dopazo H, and Marti-Renom MA
- Subjects
- Algorithms, Codon genetics, Databases, Protein, Genetic Variation, Genome, Human, Humans, Iduronic Acid analogs & derivatives, Iduronic Acid metabolism, Polymorphism, Single Nucleotide, Proteins chemistry, Tumor Suppressor Protein p53 genetics, Computational Biology methods, DNA Mutational Analysis, Evolution, Molecular, Genetic Predisposition to Disease, Point Mutation, Proteins genetics
- Abstract
Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies., ((c) 2007 Wiley-Liss, Inc.)
- Published
- 2008
- Full Text
- View/download PDF
38. A protein structure prediction service in the ProGenGrid system.
- Author
-
Mirto M, Ferramosca A, Tartarini D, Romano S, Negro A, Tasco G, Fiore S, Zara V, Casadio R, and Aloisio G
- Subjects
- Computer Simulation, Databases as Topic, Dicarboxylic Acid Transporters genetics, Humans, Italy, Program Development, Saccharomyces cerevisiae genetics, Software, Computational Biology, Computer Systems, Databases, Protein, Protein Structure, Tertiary genetics
- Abstract
This paper describes a protein tertiary structure prediction service implemented in a Grid Environment. The service has been used for predicting the dicarboxylate carrier (DIC) of Saccharomyces cerevisiae by using the homology modelling approach. The visualization of the predicted model is made possible by using an interactive virtual reality environment based on X3D and Ajax3d technologies.
- Published
- 2008
39. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.
- Author
-
Capriotti E, Calabrese R, and Casadio R
- Subjects
- Algorithms, Databases, Protein, Genetic Variation, Humans, Mutation, Phenotype, Polymorphism, Genetic, Probability, Proteins chemistry, Computational Biology methods, Evolution, Molecular, Genetic Diseases, Inborn genetics, Genetic Predisposition to Disease, Point Mutation, Polymorphism, Single Nucleotide, Proteins genetics
- Abstract
Motivation: Human single nucleotide polymorphisms (SNPs) are the most frequent type of genetic variation in human population. One of the most important goals of SNP projects is to understand which human genotype variations are related to Mendelian and complex diseases. Great interest is focused on non-synonymous coding SNPs (nsSNPs) that are responsible of protein single point mutation. nsSNPs can be neutral or disease associated. It is known that the mutation of only one residue in a protein sequence can be related to a number of pathological conditions of dramatic social impact such as Alzheimer's, Parkinson's and Creutzfeldt-Jakob's diseases. The quality and completeness of presently available SNPs databases allows the application of machine learning techniques to predict the insurgence of human diseases due to single point protein mutation starting from the protein sequence., Results: In this paper, we develop a method based on support vector machines (SVMs) that starting from the protein sequence information can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. Using a dataset of 21 185 single point mutations, 61% of which are disease-related, out of 3587 proteins, we show that our predictor can reach more than 74% accuracy in the specific task of predicting whether a single point mutation can be disease related or not. Our method, although based on less information, outperforms other web-available predictors implementing different approaches., Availability: A beta version of the web tool is available at http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi
- Published
- 2006
- Full Text
- View/download PDF
40. New Escherichia coli outer membrane proteins identified through prediction and experimental verification.
- Author
-
Marani P, Wagner S, Baars L, Genevaux P, de Gier JW, Nilsson I, Casadio R, and von Heijne G
- Subjects
- Bacterial Outer Membrane Proteins genetics, Bacterial Outer Membrane Proteins isolation & purification, Bacterial Proteins chemistry, Bacterial Proteins genetics, Bacterial Proteins metabolism, Cell Fractionation, Cloning, Molecular, Escherichia coli chemistry, Escherichia coli genetics, Escherichia coli metabolism, Escherichia coli Proteins genetics, Escherichia coli Proteins isolation & purification, Membrane Transport Proteins genetics, Membrane Transport Proteins isolation & purification, Proteome chemistry, Proteome genetics, Proteome metabolism, Bacterial Outer Membrane Proteins metabolism, Computational Biology methods, Escherichia coli enzymology, Escherichia coli Proteins metabolism, Membrane Transport Proteins metabolism
- Abstract
Many new Escherichia coli outer membrane proteins have recently been identified by proteomics techniques. However, poorly expressed proteins and proteins expressed only under certain conditions may escape detection when wild-type cells are grown under standard conditions. Here, we have taken a complementary approach where candidate outer membrane proteins have been identified by bioinformatics prediction, cloned and overexpressed, and finally localized by cell fractionation experiments. Out of eight predicted outer membrane proteins, we have confirmed the outer membrane localization for five-YftM, YaiO, YfaZ, CsgF, and YliI--and also provide preliminary data indicating that a sixth--YfaL--may be an outer membrane autotransporter.
- Published
- 2006
- Full Text
- View/download PDF
41. Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity.
- Author
-
Milanesi L, Petrillo M, Sepe L, Boccia A, D'Agostino N, Passamano M, Di Nardo S, Tasco G, Casadio R, and Paolella G
- Subjects
- Algorithms, Amino Acid Motifs, Animals, Apoptosis, Cell Proliferation, Chromosome Mapping, Databases, Genetic, Databases, Protein, Disulfides, Exons, Genetic Variation, Genome, Humans, Internet, Mice, Models, Genetic, Molecular Sequence Data, Multigene Family, Protein Structure, Tertiary, Proteome, Sequence Analysis, Software, Structure-Activity Relationship, Alternative Splicing, Computational Biology methods, Gene Expression Regulation, Enzymologic, Phosphotransferases genetics
- Abstract
Background: Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosis. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies., Results: A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was performed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge., Conclusion: The predicted human kinome was extended by identifying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set.
- Published
- 2005
- Full Text
- View/download PDF
42. A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins.
- Author
-
Fariselli P, Martelli PL, and Casadio R
- Subjects
- Algorithms, Bayes Theorem, Computer Graphics, Databases, Protein, Internet, Markov Chains, Membrane Proteins, Models, Biological, Models, Chemical, Models, Molecular, Models, Statistical, Probability, Protein Structure, Secondary, Sequence Analysis, DNA, Sequence Analysis, Protein, Software, Cell Membrane metabolism, Computational Biology methods
- Abstract
Background: Structure prediction of membrane proteins is still a challenging computational problem. Hidden Markov models (HMM) have been successfully applied to the problem of predicting membrane protein topology. In a predictive task, the HMM is endowed with a decoding algorithm in order to assign the most probable state path, and in turn the labels, to an unknown sequence. The Viterbi and the posterior decoding algorithms are the most common. The former is very efficient when one path dominates, while the latter, even though does not guarantee to preserve the HMM grammar, is more effective when several concurring paths have similar probabilities. A third good alternative is 1-best, which was shown to perform equal or better than Viterbi., Results: In this paper we introduce the posterior-Viterbi (PV) a new decoding which combines the posterior and Viterbi algorithms. PV is a two step process: first the posterior probability of each state is computed and then the best posterior allowed path through the model is evaluated by a Viterbi algorithm., Conclusion: We show that PV decoding performs better than other algorithms when tested on the problem of the prediction of the topology of beta-barrel membrane proteins.
- Published
- 2005
- Full Text
- View/download PDF
43. A Shannon entropy-based filter detects high- quality profile-profile alignments in searches for remote homologues.
- Author
-
Capriotti E, Fariselli P, Rossi I, and Casadio R
- Subjects
- Algorithms, Conserved Sequence, Databases, Protein, Models, Molecular, Protein Folding, Sensitivity and Specificity, Software, Computational Biology methods, Entropy, Proteins chemistry, Sequence Alignment methods, Sequence Homology, Amino Acid
- Abstract
Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program., (Copyright 2003 Wiley-Liss, Inc.)
- Published
- 2004
- Full Text
- View/download PDF
44. In silico prediction of the structure of membrane proteins: is it feasible?
- Author
-
Casadio R, Fariselli P, and Martelli PL
- Subjects
- Models, Molecular, Computational Biology, Membrane Proteins chemistry, Protein Structure, Tertiary, Sequence Analysis, Protein
- Abstract
In the 'omic' era, hundreds of genomes are available for protein sequence analysis, and some 30 per cent of all sequences are of membrane proteins. Unlike globular proteins, a 3D model for membrane proteins can hardly be computed starting from the sequence. Why is this so? What can we really compute and with what reliability? These and other matters are outlined.
- Published
- 2003
- Full Text
- View/download PDF
45. Improved prediction of the number of residue contacts in proteins by recurrent neural networks.
- Author
-
Pollastri G, Baldi P, Fariselli P, and Casadio R
- Subjects
- Amino Acid Sequence, Databases, Protein, Molecular Structure, Computational Biology, Neural Networks, Computer, Proteins chemistry
- Abstract
Knowing the number of residue contacts in a protein is crucial for deriving constraints useful in modeling protein folding, protein structure, and/or scoring remote homology searches. Here we use an ensemble of bi-directional recurrent neural network architectures and evolutionary information to improve the state-of-the-art in contact prediction using a large corpus of curated data. The ensemble is used to discriminate between two different states of residue contacts, characterized by a contact number higher or lower than the average value of the residue distribution. The ensemble achieves performances ranging from 70.1% to 73.1% depending on the radius adopted to discriminate contacts (6Ato 12A). These performances represent gains of 15% to 20% over the base line statistical predictors always assigning an aminoacid to the most numerous state, 3% to 7% better than any previous method. Combination of different radius predictors further improves the performance. SERVER: http://promoter.ics.uci.edu/BRNN-PRED/.
- Published
- 2001
- Full Text
- View/download PDF
46. DOME: recommendations for supervised machine learning validation in biology
- Author
-
Walsh, I., Fishman, D., Garcia-Gasulla, D., Titma, T., Pollastri, G., Capriotti, E., Casadio, R., Capella-Gutierrez, S., Cirillo, D., Del Conte, A., Dimopoulos, A. C., Del Angel, V. D., Dopazo, J., Fariselli, P., Fernandez, J. M., Huber, F., Kreshuk, A., Lenaerts, T., Martelli, P. L., Navarro, A., Broin, P. O., Pinero, J., Piovesan, D., Reczko, M., Ronzano, F., Satagopam, V., Savojardo, C., Spiwok, V., Tangaro, M. A., Tartari, G., Salgado, D., Valencia, A., Zambelli, F., Harrow, J., Psomopoulos, F. E., Tosatto, S. C. E., Barcelona Supercomputing Center, Informatics and Applied Informatics, Artificial Intelligence, Walsh I., Fishman D., Garcia-Gasulla D., Titma T., Pollastri G., Capriotti E., Casadio R., Capella-Gutierrez S., Cirillo D., Del Conte A., Dimopoulos A.C., Del Angel V.D., Dopazo J., Fariselli P., Fernandez J.M., Huber F., Kreshuk A., Lenaerts T., Martelli P.L., Navarro A., Broin P.O., Pinero J., Piovesan D., Reczko M., Ronzano F., Satagopam V., Savojardo C., Spiwok V., Tangaro M.A., Tartari G., Salgado D., Valencia A., Zambelli F., Harrow J., Psomopoulos F.E., and Tosatto S.C.E.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Standards ,Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC] ,Center of excellence ,European Regional Development Fund ,Guidelines as Topic ,Algorithms ,Computational Biology ,Humans ,Models, Biological ,Research Design ,Supervised Machine Learning ,Machine learning ,computer.software_genre ,Biochemistry ,Biologia computacional ,Machine Learning (cs.LG) ,03 medical and health sciences ,0302 clinical medicine ,Models ,Agency (sociology) ,media_common.cataloged_instance ,Biomedical research ,European union ,Molecular Biology ,030304 developmental biology ,computer.programming_language ,media_common ,0303 health sciences ,business.industry ,Cell Biology ,Other Quantitative Biology (q-bio.OT) ,Biological ,Machine Learning, Artificial Intelligence, Machine Learning in Life Science ,Focus group ,Quantitative Biology - Other Quantitative Biology ,Work (electrical) ,FOS: Biological sciences ,Elixir (programming language) ,Artificial intelligence ,business ,computer ,Software ,030217 neurology & neurosurgery ,Biotechnology ,Career development - Abstract
Supervised machine learning is widely used in biology and deserves more scrutiny. We present a set of community-wide recommendations (DOME) aiming to help establish standards of supervised machine learning validation in biology. Formulated as questions, the DOME recommendations improve the assessment and reproducibility of papers when included as supplementary material. The work of the Machine Learning Focus Group was funded by ELIXIR, the research infrastructure for life-science data. IW was funded by the A*STAR Career Development Award (project no. C210112057) from the Agency for Science, Technology and Research (A*STAR), Singapore. D.F. was supported by Estonian Research Council grants (PRG1095, PSG59 and ERA-NET TRANSCAN-2 (BioEndoCar)); Project No 2014-2020.4.01.16-0271, ELIXIR and the European Regional Development Fund through EXCITE Center of Excellence. S.C.E.T. has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Grant agreements No. 778247 and No. 823886, and Italian Ministry of University and Research PRIN 2017 grant 2017483NH8. Peer Reviewed "Article signat per 8 autors més 28 autors/es de l' ELIXIR Machine Learning Focus Group: Emidio Capriotti, Rita Casadio, Salvador Capella-Gutierrez, Davide Cirillo, Alessio Del Conte, Alexandros C. Dimopoulos, Victoria Dominguez Del Angel, Joaquin Dopazo, Piero Fariselli, José Maria Fernández, Florian Huber, Anna Kreshuk, Tom Lenaerts, Pier Luigi Martelli, Arcadi Navarro, Pilib Ó Broin, Janet Piñero, Damiano Piovesan, Martin Reczko, Francesco Ronzano, Venkata Satagopam, Castrense Savojardo, Vojtech Spiwok, Marco Antonio Tangaro, Giacomo Tartari, David Salgado, Alfonso Valencia & Federico Zambelli"
- Published
- 2021
- Full Text
- View/download PDF
47. Computational Resources for Molecular Biology 2022
- Author
-
Casadio, R, Mathews, DH, Sternberg, MJE, and Biotechnology and Biological Sciences Research Council (BBSRC)
- Subjects
Biochemistry & Molecular Biology ,0304 Medicinal and Biomolecular Chemistry ,Structural Biology ,Computational Biology ,0601 Biochemistry and Cell Biology ,Molecular Biology ,0605 Microbiology - Published
- 2022
48. Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge
- Author
-
Rita Casadio, Maria Petrosino, Paola Turina, Panagiotis Katsonis, Debnath Pal, Alexey Strokach, Lukas Folkman, Emidio Capriotti, Pier Luigi Martelli, Yaoqi Zhou, Valerio Consalvi, Steven E. Brenner, Aditi Garg, Philip M. Kim, Alessandra Pasquo, Giulia Babbi, Roberta Chiaraluce, Samuele Bovo, Piero Fariselli, Olivier Lichtarge, Castrense Savojardo, Mostafa Karimi, Carles Corbi-Verge, Yang Shen, Gaia Andreoletti, Savojardo C., Petrosino M., Babbi G., Bovo S., Corbi-Verge C., Casadio R., Fariselli P., Folkman L., Garg A., Karimi M., Katsonis P., Kim P.M., Lichtarge O., Martelli P.L., Pasquo A., Pal D., Shen Y., Strokach A.V., Turina P., Zhou Y., Andreoletti G., Brenner S.E., Chiaraluce R., Consalvi V., Capriotti E., Savojardo, C., Petrosino, M., Babbi, G., Bovo, S., Corbi-Verge, C., Casadio, R., Fariselli, P., Folkman, L., Garg, A., Karimi, M., Katsonis, P., Kim, P. M., Lichtarge, O., Martelli, P. L., Pasquo, A., Pal, D., Shen, Y., Strokach, A. V., Turina, P., Zhou, Y., Andreoletti, G., Brenner, S. E., Chiaraluce, R., Consalvi, V., and Capriotti, E.
- Subjects
Models, Molecular ,Circular dichroism ,Protein Folding ,Protein Conformation ,Computational biology ,free energy change ,machine learning ,protein folding ,protein stability ,single amino acid variant ,Genome ,Article ,03 medical and health sciences ,Protein stability ,Models ,Iron-Binding Proteins ,Genetics ,Humans ,Single amino acid ,Genetics (clinical) ,Algorithms ,Circular Dichroism ,Protein Stability ,Amino Acid Substitution ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,biology ,030305 genetics & heredity ,Molecular ,Amino acid ,chemistry ,Frataxin ,biology.protein ,Critical assessment ,Protein folding - Abstract
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
- Published
- 2019
49. CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases
- Author
-
Zhiqiang Hu, Jesse M. Hunter, Olivier Lichtarge, Sean D. Mooney, Aashish N. Adhikari, Steven E. Brenner, Rita Casadio, Yizhou Yin, Lipika R. Pal, Uma Sunderam, Panagiotis Katsonis, Predrag Radivojac, Thomas Joseph, Giulia Babbi, Naveen Sivadasan, Constantina Bakolitsa, Vangala G. Saipradeep, Laura Kasak, John Moult, Julian Gough, M. Stephen Meyn, Pier Luigi Martelli, Jennifer Poitras, Rupa A Udani, Jan Zaucha, Rafael F. Guerrero, Yuxiang Jiang, Aditya Rao, Sujatha Kotte, Kunal Kundu, Kasak L., Hunter J.M., Udani R., Bakolitsa C., Hu Z., Adhikari A.N., Babbi G., Casadio R., Gough J., Guerrero R.F., Jiang Y., Joseph T., Katsonis P., Kotte S., Kundu K., Lichtarge O., Martelli P.L., Mooney S.D., Moult J., Pal L.R., Poitras J., Radivojac P., Rao A., Sivadasan N., Sunderam U., Saipradeep V.G., Yin Y., Zaucha J., Brenner S.E., and Meyn M.S.
- Subjects
Male ,Adolescent ,In silico ,Genomic data ,Computational biology ,Biology ,Undiagnosed Diseases ,Genome ,Article ,03 medical and health sciences ,Databases, Genetic ,SickKid ,pediatric rare disease ,Genetics ,Humans ,Computer Simulation ,Genetic Predisposition to Disease ,Child ,Gene ,Genetics (clinical) ,030304 developmental biology ,Disease gene ,0303 health sciences ,Whole Genome Sequencing ,variant interpretation ,030305 genetics & heredity ,Computational Biology ,Genetic Variation ,Pathogenicity ,Phenotype ,ddc ,phenotype prediction ,Child, Preschool ,New disease ,CAGI ,Female ,whole-genome sequencing data - Abstract
Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.
- Published
- 2019
- Full Text
- View/download PDF
50. Assessing predictions on fitness effects of missense variants in calmodulin
- Author
-
Frederick P. Roth, Debnath Pal, Castrense Savojardo, Emidio Capriotti, Lisa N. Kinch, Rita Casadio, Marta Verby, Jing Zhang, Olivier Lichtarge, Qian Cong, Song Sun, Panagiotis Katsonis, Jochen Weile, Aditi Garg, Nick V. Grishin, Pier Luigi Martelli, Giulia Babbi, Zhang J., Kinch L.N., Cong Q., Katsonis P., Lichtarge O., Savojardo C., Babbi G., Martelli P.L., Capriotti E., Casadio R., Garg A., Pal D., Weile J., Sun S., Verby M., Roth F.P., and Grishin N.V.
- Subjects
Models, Molecular ,calmodulin ,Calmodulin ,Protein Conformation ,Mutation, Missense ,Computational and Data Sciences ,Computational biology ,Biology ,Protein Engineering ,Genome ,Article ,Evolution, Molecular ,Fungal Proteins ,03 medical and health sciences ,Yeasts ,Genetics ,Humans ,Missense mutation ,Genetics (clinical) ,030304 developmental biology ,disease ,0303 health sciences ,Binding Sites ,Models, Genetic ,030305 genetics & heredity ,Computational Biology ,missense variant ,predictors ,Calcium concentration ,biology.protein ,CAGI ,Calcium ,Critical assessment ,Genetic Fitness ,Fitness effects ,Algorithms - Abstract
This paper reports the evaluation of predictions for the ``CALM1'' challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.