26 results on '"Fariselli P"'
Search Results
2. Disulfide connectivity prediction with extreme learning machines
- Author
-
Alhamdoosh, M., CASTRENSE SAVOJARDO, Fariselli, P., Casadio, R., PELLEGRINI M, FRED ALN, FILIPE J, GAMBOA H, Alhamdoosh M., Savojardo C., Fariselli P., and Casadio R.
- Subjects
EXTREME LEARNING MACHINE ,BACKPROPAGATION ,DISULFIDE BONDS - Abstract
Our paper emphasizes the relevance of Extreme Learning Machine (ELM) in Bioinformatics applications by addressing the problem of predicting the disulfide connectivity from protein sequences. We test different activation functions of the hidden neurons and we show that for the task at hand the Radial Basis Functions are the best performing. We also show that the ELM approach performs better than the Back Propagation learning algorithm both in terms of generalization accuracy and running time. Moreover, we find that for the problem of the prediction of the disulfide connectivity it is possible to increase the predicting performance by initializing the Radial Basis Function kernels with a k-mean clustering algorithm. Finally, the ELM procedure is not only very fast but the final predicting networks can achieve an accuracy of 0.51 and 0.45, per-bonds and per-pattern, respectively. Our ELM results are in line with the state of the art predictors addressing the same problem.
3. Fault tolerance for large scale protein 3D reconstruction from contact maps
- Author
-
Vassura, M., Margara, L., PIETRO DI LENA, Medri, F., Fariselli, P., and Casadio, R.
4. Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure
- Author
-
Rita Casadio, Giovanni Aloisio, Maria Mirto, Piero Fariselli, Pietro Di Lena, Marco Vassura, Luciano Margara, Vassura M., Di Lena P., Margara L., Mirto M., Aloisio G., Fariselli P., Casadio R., Vassura, M, Di Lena, P, Margara, L, Mirto, Maria, Aloisio, Giovanni, Fariselli, P, and Casadio, R.
- Subjects
CONTACT MAP ,Computer science ,Computation ,Context (language use) ,lcsh:Analysis ,PROTEIN STRUCTURE ,ALGORITHM FOR 3-D STRUCTURE RECONSTRUCTION ,GRID TECHNOLOGY ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Set (abstract data type) ,Dimension (vector space) ,Genetics ,Representation (mathematics) ,Molecular Biology ,Structure (mathematical logic) ,Research ,3D reconstruction ,lcsh:QA299.6-433 ,Protein structure prediction ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,lcsh:R858-859.7 ,Algorithm - Abstract
Background The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone. Methods In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand. Results We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes. Conclusions All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.
- Published
- 2011
5. High Throughput Protein Similarity Searches in the LIBI Grid Problem Solving Environment
- Author
-
Rita Casadio, Giovanni Aloisio, Ivan Rossi, Maria Mirto, Sandro Fiore, Piero Fariselli, Italo Epicoco, Mirto M., Rossi I., Epicoco I., Fiore S., Fariselli P., Casadio R., Aloisio G., P. Thulasiraman, X. He, T. Li Xu, M. K. Denko, R. K. Thulasiram, L. T. Yang, Mirto, M, Rossi, I, Epicoco, Italo, Fiore, S, Fariselli, P, Casadio, R, and Aloisio, Giovanni
- Subjects
Bioinformatics, Protein Similarity Searches ,Bioinformatics requirements, Complex applications, High computing power, Problem Solving Environment (PSE) ,Computer science ,Scale (chemistry) ,Distributed computing ,Integration platform ,Problem solving environment ,Biological database ,Grid ,Supercomputer ,Throughput (business) - Abstract
Bioinformatics applications are naturally distributed, due to distribution of involved data sets, experimental data and biological databases. They require high computing power, owing to the large size of data sets and the complexity of basic computations, may access heterogeneous data, where heterogeneity is in data format, access policy, distribution, etc., and require a secure infrastructure, because they could access private data owned by different organizations. The Problem Solving Environment (PSE) is an approach and a technology that can fulfil such bioinformatics requirements. The PSE can be used for the definition and composition of complex applications, hiding programming and configuration details to the user that can concentrate only on the specific problem. Moreover, Grids can be used for building geographically distributed collaborative problem solving environments and Grid aware PSEs can search and use dispersed high performance computing, networking, and data resources. In this work, the PSE solution has been chosen as the integration platform of bioinformatics tools and data sources. In particular an experiment of multiple sequence alignment on large scale, supported by the LIBI PSE, is presented.
- Published
- 2007
6. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation
- Author
-
Piero Fariselli, Castrense Savojardo, Rita Casadio, Pier Luigi Martelli, Savojardo C., Martelli P.L., Casadio R., and Fariselli P.
- Subjects
Computer science ,Stability (learning theory) ,Value (computer science) ,Review Article ,Variation (game tree) ,Overfitting ,Machine learning ,computer.software_genre ,Task (project management) ,Machine Learning ,Structural bioinformatics ,03 medical and health sciences ,0302 clinical medicine ,Protein stability ,Mutant protein ,Robustness (computer science) ,Molecular Biology ,Reliability (statistics) ,030304 developmental biology ,0303 health sciences ,Protein Stability ,business.industry ,Proteins ,A protein ,structural bioinformatics, protein stability prediction, impact of mutations on protein stability, machine learning ,Complement (complexity) ,Fang ,Mutation ,Mutation (genetic algorithm) ,Artificial intelligence ,business ,computer ,Algorithm ,Algorithms ,030217 neurology & neurosurgery ,Information Systems - Abstract
A number of machine learning (ML)-based algorithms have been proposed for predicting mutation-induced stability changes in proteins. In this critical review, we used hypothetical reverse mutations to evaluate the performance of five representative algorithms and found all of them suffer from the problem of overfitting. This approach is based on the fact that if a wild-type protein is more stable than a mutant protein, then the same mutant is less stable than the wild-type protein. We analyzed the underlying issues and suggest that the main causes of the overfitting problem include that the numbers of training cases were too small, and the features used in the models were not sufficiently informative for the task. We make recommendations on how to avoid overfitting in this important research area and improve the reliability and robustness of ML-based algorithms in general.
- Published
- 2019
7. Computer-Aided Prediction of Protein Mitochondrial Localization
- Author
-
Pier Luigi Martelli, Castrense Savojardo, Giacomo Tartari, Piero Fariselli, Rita Casadio, Martelli P.L., Savojardo C., Fariselli P., Tartari G., and Casadio R.
- Subjects
Prediction of subcellular localization ,Targeting peptide ,Peptide ,Cleavage site ,Computational biology ,Web Browser ,Biology ,Ribosome ,03 medical and health sciences ,Deep Learning ,0302 clinical medicine ,Organelle ,Mitochondrial Protein ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,Computational Biology ,Compartment (chemistry) ,Subcellular localization ,Mitochondria ,Protein Transport ,Order (biology) ,chemistry ,Arginine motif ,UniProt ,Intermembrane space ,Machine and deep learning ,030217 neurology & neurosurgery ,Human - Abstract
Protein sequences, directly translated from genomic data, need functional and structural annotation. Together with molecular function and biological process, subcellular localization is an important feature necessary for understanding the protein role and the compartment where the mature protein is active. In the case of mitochondrial proteins, their precursor sequences translated by the ribosome machinery include specific patterns from which it is possible not only to recognize their final destination within the organelle but also which of the mitochondrial subcompartments the protein is intended for. Four compartments are routinely discriminated, including the inner and the outer membranes, the intermembrane space, and the matrix. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequence and todiscriminate their final destination in the organelle. We benchmark two of our methods on the general task of recognizing human mitochondrial proteins endowed with an experimentally characterized targeting peptide (TPpred3) and predicting which submitochondrial compartment is the final destination (DeepMito). We describe how to adopt our web servers in order to discriminate which human proteins are endowed with mitochondrial targeting peptides, the position of cleavage sites, and which submitochondrial compartment are intended for. By this, we add some other 1788 human proteins to the 450 ones already manually annotated in UniProt with a mitochondrial targeting peptide, providing for each of them also the characterization of the suborganellar localization.
- Published
- 2021
8. Long-term outcomes and predictive ability of non-invasive scoring systems in patients with non-alcoholic fatty liver disease
- Author
-
Maria Jose Garcia Blanco, Piero Fariselli, Dina Tiniakos, Antonio Grieco, Chiara Rosso, Daniela Cabibi, Anna Ludovica Fracanzani, Fabio Maria Vecchio, Tiziana Sanavia, Javier Ampuero, Angelo Armandi, Olivier Govaere, Alastair D. Burt, Rocío Aller, Manuel Romero-Gómez, Elisabetta Bugianesi, Salvatore Petta, Antonio Liguori, Luca Miele, Gian Paolo Caviglia, Marco Maggioni, Luca Valenti, Ezio David, Paolo Francione, Rocío Gallego-Durán, María Jesús Pareja, Quentin M. Anstee, Marco Y W Zaki, Ramy Younes, Grazia Pennisi, European Commission, Newcastle Biomedical Research Centre, Ministero della Salute, Ministero dell'Istruzione, dell'Università e della Ricerca, Younes R., Caviglia G.P., Govaere O., Rosso C., Armandi A., Sanavia T., Pennisi G., Liguori A., Francione P., Gallego-Duran R., Ampuero J., Garcia Blanco M.J., Aller R., Tiniakos D., Burt A., David E., Vecchio F.M., Maggioni M., Cabibi D., Pareja M.J., Zaki M.Y.W., Grieco A., Fracanzani A.L., Valenti L., Miele L., Fariselli P., Petta S., Romero-Gomez M., Anstee Q.M., and Bugianesi E.
- Subjects
0301 basic medicine ,Adult ,Male ,medicine.medical_specialty ,Cirrhosis ,Concordance ,Settore MED/12 - GASTROENTEROLOGIA ,HFS ,Disease ,BARD ,Gastroenterology ,Severity of Illness Index ,Time ,03 medical and health sciences ,0302 clinical medicine ,Fibrosis ,Non-alcoholic Fatty Liver Disease ,Predictive Value of Tests ,Internal medicine ,NFS ,medicine ,Humans ,In patient ,APRI ,NSS ,Hepatology ,business.industry ,Fatty liver ,NASH ,Reproducibility of Results ,Middle Aged ,medicine.disease ,Prognosis ,APRI, BARD, FIB-4, HFS, NASH, NFS, NSS, Adult, Area Under Curve, Cross-Sectional Studies, Female, Humans, Liver, Male,Middle Aged, Non-alcoholic Fatty Liver Disease,Prognosis, ROC Curve,Reproducibility of Results, Research Design, Severity of Illness Index, Predictive Value of Tests, Time ,030104 developmental biology ,Cross-Sectional Studies ,Liver ,ROC Curve ,Research Design ,Area Under Curve ,Cohort ,FIB-4 ,030211 gastroenterology & hepatology ,Female ,business ,Liver cancer - Abstract
[Background & Aims] Non-invasive scoring systems (NSS) are used to identify patients with non-alcoholic fatty liver disease (NAFLD) who are at risk of advanced fibrosis, but their reliability in predicting long-term outcomes for hepatic/extrahepatic complications or death and their concordance in cross-sectional and longitudinal risk stratification remain uncertain., [Methods] The most common NSS (NFS, FIB-4, BARD, APRI) and the Hepamet fibrosis score (HFS) were assessed in 1,173 European patients with NAFLD from tertiary centres. Performance for fibrosis risk stratification and for the prediction of long-term hepatic/extrahepatic events, hepatocarcinoma (HCC) and overall mortality were evaluated in terms of AUC and Harrell’s c-index. For longitudinal data, NSS-based Cox proportional hazard models were trained on the whole cohort with repeated 5-fold cross-validation, sampling for testing from the 607 patients with all NSS available., [Results] Cross-sectional analysis revealed HFS as the best performer for the identification of significant (F0-1 vs. F2-4, AUC = 0.758) and advanced (F0-2 vs. F3-4, AUC = 0.805) fibrosis, while NFS and FIB-4 showed the best performance for detecting histological cirrhosis (range AUCs 0.85-0.88). Considering longitudinal data (follow-up between 62 and 110 months), NFS and FIB-4 were the best at predicting liver-related events (c-indices>0.7), NFS for HCC (c-index = 0.9 on average), and FIB-4 and HFS for overall mortality (c-indices >0.8). All NSS showed limited performance (c-indices, [Conclusions] Overall, NFS, HFS and FIB-4 outperformed APRI and BARD for both cross-sectional identification of fibrosis and prediction of long-term outcomes, confirming that they are useful tools for the clinical management of patients with NAFLD at increased risk of fibrosis and liver-related complications or death., [Lay summary] Non-invasive scoring systems are increasingly being used in patients with non-alcoholic fatty liver disease to identify those at risk of advanced fibrosis and hence clinical complications. Herein, we compared various non-invasive scoring systems and identified those that were best at identifying risk, as well as those that were best for the prediction of long-term outcomes, such as liver-related events, liver cancer and death., This study has been supported by the EPoS (Elucidating Pathways of Steatohepatitis) consortium funded by the Horizon 2020 Framework Program of the European Union under Grant Agreement 634413 and the Newcastle NIHR Biomedical Research Centre. The authors are contributing members of The European NAFLD Registry. The study was also supported by the Italian Ministry of Health, grant RF-2016-02364358 (Ricerca Finalizzata, Ministero della Salute), and the Italian Ministry for Education, University and Research (Ministero dell’Istruzione, dell’Università e della Ricerca - MIUR) under the programme “Dipartimenti di Eccellenza 2018 – 2022” Project code D15D18000410001.
- Published
- 2020
9. DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations
- Author
-
Piero Fariselli, Emidio Capriotti, Ludovica Montanucci, Yotam Frank, Nir Ben-Tal, Montanucci L., Capriotti E., Frank Y., Ben-Tal N., and Fariselli P.
- Subjects
Property (programming) ,Protein variant ,Value (computer science) ,Multiple site variation ,Unfolding free energy change ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,Evolution, Molecular ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Thermodynamic ,Structural Biology ,Simple (abstract algebra) ,Humans ,Point Mutation ,Protein stability ,Amino Acid Sequence ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,Mathematics ,0303 health sciences ,Sequence ,Protein Stability ,Applied Mathematics ,Protein ,Research ,Proteins ,Pearson product-moment correlation coefficient ,Computer Science Applications ,Algorithm ,lcsh:Biology (General) ,030220 oncology & carcinogenesis ,Mutation (genetic algorithm) ,Benchmark (computing) ,symbols ,lcsh:R858-859.7 ,Thermodynamics ,Reciprocal ,Algorithms ,Human - Abstract
Background Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = −∆∆G(B → A), where A and B are amino acids. Results Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. Conclusions Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2923-1) contains supplementary material, which is available to authorized users.
- Published
- 2019
10. The prediction of organelle-targeting peptides in eukaryotic proteins with Grammatical-Restrained Hidden Conditional Random Fields
- Author
-
Pier Luigi Martelli, Rita Casadio, Castrense Savojardo, Piero Fariselli, Valentina Indio, Indio V, Martelli PL, Savojardo C, Fariselli P, and Casadio R
- Subjects
Statistics and Probability ,Conditional random field ,Saccharomyces cerevisiae Proteins ,Proteome ,Target peptide ,Peptide ,Computational biology ,Protein Sorting Signals ,Biology ,computer.software_genre ,Biochemistry ,Mitochondrial Proteins ,Chloroplast Proteins ,Sequence Analysis, Protein ,Organelle ,Humans ,Position-Specific Scoring Matrices ,Plastids ,Plastid ,Molecular Biology ,GRAMMATICAL-RESTRAINED CONDITIONAL RANDOM FIELDS ,chemistry.chemical_classification ,Arabidopsis Proteins ,Eukaryota ,TARGETING PEPTIDE ,PROTEIN SUBCELLULAR LOCALIZATION ,Subcellular localization ,Yeast ,Mitochondria ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,chemistry ,Data mining ,Peptides ,computer ,Software - Abstract
Motivation: Targeting peptides are the most important signal controlling the import of nuclear encoded proteins into mitochondria and plastids. In the lack of experimental information, their prediction is an essential step when proteomes are annotated for inferring both the localization and the sequence of mature proteins. Results: We developed TPpred a new predictor of organelle-targeting peptides based on Grammatical-Restrained Hidden Conditional Random Fields. TPpred is trained on a non-redundant dataset of proteins where the presence of a target peptide was experimentally validated, comprising 297 sequences. When tested on the 297 positive and some other 8010 negative examples, TPpred outperformed available methods in both accuracy and Matthews correlation index (96% and 0.58, respectively). Given its very low–false-positive rate (3.0%), TPpred is, therefore, well suited for large-scale analyses at the proteome level. We predicted that from ∼4 to 9% of the sequences of human, Arabidopsis thaliana and yeast proteomes contain targeting peptides and are, therefore, likely to be localized in mitochondria and plastids. TPpred predictions correlate to a good extent with the experimental annotation of the subcellular localization, when available. TPpred was also trained and tested to predict the cleavage site of the organelle-targeting peptide: on this task, the average error of TPpred on mitochondrial and plastidic proteins is 7 and 15 residues, respectively. This value is lower than the error reported by other methods currently available. Availability: The TPpred datasets are available at http://biocomp.unibo.it/∼valentina/TPpred/. TPpred is available on request from the authors. Contact: gigi@biocomp.unibo.it Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2013
11. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences
- Author
-
Piero Fariselli, Andrea Zauli, Rita Casadio, Damiano Piovesan, Ivan Rossi, Pier Luigi Martelli, Piovesan D., Martelli P.L., Fariselli P., Zauli A., Rossi I., and Casadio R.
- Subjects
Internet ,PROTEIN FUNCTIONAL ANNOTATION ,Protein Conformation ,PROTEIN STRUCTURAL ANNOTATION ,LARGE SCALE PROTEIN SEQUENCE COMPARISON ,Structural alignment ,Protein Data Bank (RCSB PDB) ,Proteins ,Molecular Sequence Annotation ,Sequence alignment ,Articles ,Computational biology ,Genome project ,Biology ,Bioinformatics ,Annotation ,Template ,Sequence Analysis, Protein ,Genetics ,Cluster Analysis ,UniProt ,Sequence Alignment ,Software - Abstract
We introduce BAR-PLUS (BAR(+)), a web server for functional and structural annotation of protein sequences. BAR(+) is based on a large-scale genome cross comparison and a non-hierarchical clustering procedure characterized by a metric that ensures a reliable transfer of features within clusters. In this version, the method takes advantage of a large-scale pairwise sequence comparison of 13,495,736 protein chains also including 988 complete proteomes. Available sequence annotation is derived from UniProtKB, GO, Pfam and PDB. When PDB templates are present within a cluster (with or without their SCOP classification), profile Hidden Markov Models (HMMs) are computed on the basis of sequence to structure alignment and are cluster-associated (Cluster-HMM). Therefrom, a library of 10,858 HMMs is made available for aligning even distantly related sequences for structural modelling. The server also provides pairwise query sequence-structural target alignments computed from the correspondent Cluster-HMM. BAR(+) in its present version allows three main categories of annotation: PDB [with or without SCOP (*)] and GO and/or Pfam; PDB (*) without GO and/or Pfam; GO and/or Pfam without PDB (*) and no annotation. Each category can further comprise clusters where GO and Pfam functional annotations are or are not statistically significant. BAR(+) is available at http://bar.biocomp.unibo.it/bar2.0.
- Published
- 2011
12. The Prediction of Protein-Protein Interacting Sites in Genome-Wide Protein Interaction Networks: The Test Case of the Human Cell Cycle
- Author
-
Pier Luigi Martelli, Piero Fariselli, Lisa Bartoli, Rita Casadio, Ivan Rossi, Bartoli L., Martelli P.L., Rossi I., Fariselli P., and Casadio R.
- Subjects
Proteome ,Correlation coefficient ,Surface Properties ,CO-EXPRESSION ,PREDICTION OF PROTEIN INTERACTION PATCHES ,Value (computer science) ,Cell Cycle Proteins ,Computational biology ,Biology ,PROTEIN-PROTEIN INTERACTION ,INTERACTOME ,SUBCELLULAR CO-LOCALIZATION ,Machine learning ,computer.software_genre ,Biochemistry ,Interactome ,Correlation ,symbols.namesake ,Human interactome ,Artificial Intelligence ,Interaction network ,Protein Interaction Mapping ,Humans ,Protein Interaction Domains and Motifs ,Databases, Protein ,Molecular Biology ,Organelles ,Degree (graph theory) ,Genome, Human ,business.industry ,Cyclin-Dependent Kinase 2 ,Cell Biology ,General Medicine ,Markov Chains ,Pearson product-moment correlation coefficient ,symbols ,Mutant Proteins ,Artificial intelligence ,business ,computer ,Algorithms - Abstract
In this paper we aim at investigating possible correlations between the number of putative interaction patches of a given protein, as inferred by an algorithm that we have developed, and its degree (number of edges of the protein node in a protein interaction network). We focus on the human cell cycle that, as compared with other biological processes, comprises the largest number of proteins whose structure is known at atomic resolution both as monomers and as interacting complexes. For predicting interaction patches we specifically develop a HM-SVM based method reaching 71% overall accuracy with a correlation coefficient value equal to 0.43 on a non redundant set of protein complexes. To test the biological meaning of our predictions, we also explore whether interacting patches contain energetically important residues and/or disease related mutations and find that predicted patches are endowed with both features. Based on this, we propose that mapping the protein with all the predicted interaction patches bridges the molecule to the interactome at the cell level. To test our hypothesis we downloaded interaction data from interaction data bases and find that the number of predicted interaction patches significantly correlates (Pearson correlation value > 0.3) with the number of the known interactions (edges) per protein in the human interactome, as contained in MINT and IntAct. We also show that the correlation increases (Pearson correlation value > 0.5) when the subcellular co-localization and the co-expression levels of the interacting partners are taken into account.
- Published
- 2010
13. Fast overlapping of protein contact maps by alignment of eigenvectors
- Author
-
Rita Casadio, Piero Fariselli, Pietro Di Lena, Marco Vassura, Luciano Margara, Di Lena P., Fariselli P., Margara L., Vassura M., and Casadio R.
- Subjects
Statistics and Probability ,CONTACT MAP ,Structural alignment ,Structure (category theory) ,PROTEIN CONFORMATION ,Machine learning ,computer.software_genre ,Biochemistry ,Similarity (network science) ,Simple (abstract algebra) ,Molecular Biology ,Eigendecomposition of a matrix ,Eigenvalues and eigenvectors ,Mathematics ,business.industry ,ALGORITHMS ,Computational Biology ,Proteins ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Key (cryptography) ,Artificial intelligence ,Noise (video) ,business ,Algorithm ,computer - Abstract
Motivation: Searching for structural similarity is a key issue of protein functional annotation. The maximum contact map overlap (CMO) is one of the possible measures of protein structure similarity. Exact and approximate methods known to optimize the CMO are computationally expensive and this hampers their applicability to large-scale comparison of protein structures. Results: In this article, we describe a heuristic algorithm (Al-Eigen) for finding a solution to the CMO problem. Our approach relies on the approximation of contact maps by eigendecomposition. We obtain good overlaps of two contact maps by computing the optimal global alignment of few principal eigenvectors. Our algorithm is simple, fast and its running time is independent of the amount of contacts in the map. Experimental testing indicates that the algorithm is comparable to exact CMO methods in terms of the overlap quality, to structural alignment methods in terms of structure similarity detection and it is fast enough to be suited for large-scale comparison of protein structures. Furthermore, our preliminary tests indicates that it is quite robust to noise, which makes it suitable for structural similarity detection also for noisy and incomplete contact maps. Availability: Available at http://bioinformatics.cs.unibo.it/Al-Eigen Contact: dilena@cs.unibo.it Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2010
14. A graph theoretic approach to protein structure selection
- Author
-
Marco Vassura, Luciano Margara, Piero Fariselli, Rita Casadio, Vassura M., Margara L., Fariselli P., and Casadio R.
- Subjects
Protein Conformation ,CONTACT MAPS ,Computer science ,business.industry ,Structure (category theory) ,Proteins ,Medicine (miscellaneous) ,Function (mathematics) ,Protein structure prediction ,PROTEIN STRUCTURE PREDICTION ,Euclidean distance ,GRAPH ALGORITHM ,Protein structure ,PROTEIN FOLDING ,PROTEIN STRUCTURE SELECTION ,Chain (algebraic topology) ,Artificial Intelligence ,Point (geometry) ,Artificial intelligence ,Graph property ,business ,Algorithm - Abstract
Objective: Protein structure prediction (PSP) aims to reconstruct the 3D structure of a given protein starting from its primary structure (chain of amino acidic residues). It is a well-known fact that the 3D structure of a protein only depends on its primary structure. PSP is one of the most important and still unsolved problems in computational biology. Protein structure selection (PSS), instead of reconstructing a 3D model for the given chain, aims to select among a given, possibly large, number of 3D structures (called decoys) those that are closer (according to a given notion of distance) to the original (unknown) one. In this paper we address PSS problem using graph theoretic techniques. Methods and materials: Existing methods for solving PSS make use of suitably defined energy functions which heavily rely on the primary structure of the protein and on protein chemistry. In this paper we present a new approach to PSS which does not take advantage of the knowledge of the primary structure of the protein but only depends on the graph theoretic properties of the decoys graphs (vertices represent residues and edges represent pairs of residues whose Euclidean distance is less than or equal to a fixed threshold). Results: Even if our methods only rely on approximate geometric information, experimental results show that some of the adopted graph properties score similarly to energy-based filtering functions in selecting the best decoys. Conclusion: Our results highlight the principal role of geometric information in PSS, setting a new starting point and filtering method for existing energy function-based techniques.
- Published
- 2009
15. Robust Determinants of Thermostability Highlighted by a Codon Frequency Index Capable of Discriminating Thermophilic from Mesophilic Genomes
- Author
-
Piero Fariselli, Ludovica Montanucci, Pier Luigi Martelli, Rita Casadio, Montanucci L., Martelli P.L., Fariselli P., and Casadio R.
- Subjects
Hot Temperature ,Index (economics) ,Archaeal Proteins ,THERMOSTABILITY DETERMINANTS ,Biology ,Bacterial Physiological Phenomena ,Biochemistry ,Genome ,Bacterial Proteins ,Genome, Archaeal ,Codon ,Organism ,Thermostability ,Genetics ,Base Composition ,PROTEIN THERMOSTABILITY PREDICTION ,Base Sequence ,Thermophile ,Sequence Analysis, DNA ,General Chemistry ,CODON COMPOSITION ,Archaea ,Crystallography ,GENOME ANALYSIS ,PRINCIPAL COMPONENT ANALYSIS ,Genome, Bacterial ,Mesophile - Abstract
Can genome analysis tell us about the lifestyle of an organism? We ask this question considering a thorough cross comparison of thermophilic and mesophilic genomes, since presently the number of available genomes is enough to ensure statistical significance of the results. We analyze, by means of principal component analysis (PCA), the codon composition of a database comprising 116 genomes, selected so as to include one species for each genus and show that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level. The results of our analysis indicate that all the known features of thermostability can be found in the 64 component loadings of the second principal axis of PCA. By this, we develop an index of thermostability whose discriminative power between mesophiles and thermophiles scores with 98% accuracy at the genome level and with 95% accuracy at the protein sequence level. We also prove that these results are not due to phylogenetic differences between archaea and bacteria.
- Published
- 2007
16. The WWWH of remote homolog detection: The state of the art
- Author
-
Rita Casadio, Piero Fariselli, Emidio Capriotti, Ivan Rossi, Fariselli P., Rossi I., Capriotti E., and Casadio R.
- Subjects
Research groups ,Molecular Sequence Data ,Review Literature as Topic ,Biology ,Structural bioinformatics ,Sequence Analysis, Protein ,Protein methods ,REMOTE HOMOLOG PROTEINS ,THREADING ,FOLD RECOGNITION ,Amino Acid Sequence ,Molecular Biology ,Structure comparison ,Conserved Sequence ,Information retrieval ,Sequence Homology, Amino Acid ,business.industry ,Proteins ,Protein structure prediction ,Sequence homology ,Artificial intelligence ,Threading (protein sequence) ,business ,Sequence Alignment ,Algorithms ,Information Systems - Abstract
The detection of remote homolog pairs of proteins using computational methods is a pivotal problem in structural bioinformatics, aiming to compute protein folds on the basis of information in the database of known structures. In the last 25 years, several methods have been developed to tackle this problem, based on different approaches including sequence-sequence alignments and/or structure comparison. In this article, we will briefly discuss When, Why, Where and How (WWWH) to perform remote homology search, reviewing some of the most widely adopted computational approaches. The specific aim is highlighting the basic criteria implemented by different research groups and commenting on the status of the art as well as on still-open questions.
- Published
- 2006
17. TRAMPLE: the transmembrane protein labelling environment
- Author
-
Piero Fariselli, Michele Finelli, Rita Casadio, Mauro Amico, Pier Luigi Martelli, Ivan Rossi, Andrea Zauli, Fariselli P., Finelli M., Rossi I., Amico M., Zauli A., Martelli P.L., and Casadio R.
- Subjects
Protein Folding ,MEMBRANE PROTEIN TOPOLOGY ,ALL-ALPHA MEMBRANE PROTEINS ,BETA-BARREL MEMBRANE PROTEINS ,PREDICTIVE METHODS ,WEB SERVER ,Web server ,Interface (computing) ,education ,Protein Sorting Signals ,Biology ,computer.software_genre ,Bioinformatics ,Protein Structure, Secondary ,Article ,Annotation ,Sequence Analysis, Protein ,Genetics ,Computer Security ,Internet ,business.industry ,Application server ,Membrane Proteins ,Usability ,Transmembrane protein ,Database Management Systems ,The Internet ,business ,computer ,Algorithms ,Software ,Content management system ,Computer network - Abstract
TRAMPLE (http://gpcr.biocomp.unibo.it/biodec/) is a web application server dedicated to the detection and the annotation of transmembrane protein sequences. TRAMPLE includes different state-ofthe- art algorithms for the prediction of signal peptides, transmembrane segments (both beta-strands and alpha-helices), secondary structure and fast fold recognition. TRAMPLE also includes a complete content management system to manage the results of the predictions. Each user of the server has his/ her own workplace, where the data can be stored, organized, accessed and annotated with documents through a simple web-based interface. In this manner, TRAMPLE significantly improves usability with respect to other more traditional web servers.
- Published
- 2005
18. A neural-network-based method for predicting protein stability changes upon single point mutations
- Author
-
Emidio Capriotti, Rita Casadio, Piero Fariselli, CAPRIOTTI E., FARISELLI P., and CASADIO R.
- Subjects
Models, Molecular ,Statistics and Probability ,Protein Denaturation ,Protein Folding ,Computer science ,Protein design ,Single-nucleotide polymorphism ,NEURAL NETWORKS ,medicine.disease_cause ,computer.software_genre ,Biochemistry ,Pattern Recognition, Automated ,Protein sequencing ,Protein stability ,Drug Stability ,Sequence Analysis, Protein ,Protein methods ,PROTEIN STABILITY ,PROTEIN MUTATION ,SNPS ,PREDICTIVE METHODS ,medicine ,Computer Simulation ,Molecular Biology ,Native structure ,Mutation ,Artificial neural network ,Point mutation ,Proteins ,Computer Science Applications ,Computational Mathematics ,Models, Chemical ,Computational Theory and Mathematics ,Pattern recognition (psychology) ,Mutation (genetic algorithm) ,Mutagenesis, Site-Directed ,Protein folding ,Neural Networks, Computer ,Data mining ,computer ,Algorithms - Abstract
Motivation: One important requirement for protein design is to be able to predict changes of protein stability upon mutation. Different methods addressing this task have been described and their performance tested considering global linear correlation between predicted and experimental data. Neither is direct statistical evaluation of their prediction performance available, nor is a direct comparison among different approaches possible. Recently, a significant database of thermodynamic data on protein stability changes upon single point mutation has been generated (ProTherm). This allows the application of machine learning techniques to predicting free energy stability changes upon mutation starting from the protein sequence. Results: In this paper, we present a neural-network-based method to predict if a given mutation increases or decreases the protein thermodynamic stability with respect to the native structure. Using a dataset consisting of 1615 mutations, our predictor correctly classifies >80% of the mutations in the database. On the same task and using the same data, our predictor performs better than other methods available on the Web. Moreover, when our system is coupled with energy-based methods, the joint prediction accuracy increases up to 90%, suggesting that it can be used to increase also the performance of pre-existing methods, and generally to improve protein design strategies. Availability: The server is under construction and will be available at http://www.biocomp.unibo.it
- Published
- 2004
19. FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps
- Author
-
Filippo Medri, Rita Casadio, Piero Fariselli, Marco Vassura, Luciano Margara, Pietro Di Lena, Vassura M., Margara L., Di Lena P., Medri F., Fariselli P., and Casadio R.
- Subjects
Models, Molecular ,PROTEIN STRUCTURE PREDICTION ,CONTACT MAP ,HEURISTIC ALGORITHM ,Statistics and Probability ,Protein Conformation ,Computer science ,Sensitivity and Specificity ,Biochemistry ,Protein Interaction Mapping ,Computer Simulation ,Molecular Biology ,Structure (mathematical logic) ,Binding Sites ,Proteins ,Reproducibility of Results ,Protein structure prediction ,Computer Science Applications ,Computational Mathematics ,Models, Chemical ,Computational Theory and Mathematics ,Algorithm ,Algorithms ,Software ,Protein Binding - Abstract
Summary: Fault Tolerant Contact Map Reconstruction (FT-COMAR) is a heuristic algorithm for the reconstruction of the protein three-dimensional structure from (possibly) incomplete (i.e. containing unknown entries) and noisy contact maps. FT-COMAR runs within minutes, allowing its application to a large-scale number of predictions. Availability: http://bioinformatics.cs.unibo.it/FT-COMAR Contact: vassura@cs.unibo.it Supplementary information: Supplementary data are available on Bioinformatics online.
- Published
- 2008
20. The bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis
- Author
-
Piero Fariselli, Giacinto Donvito, Ludovica Montanucci, Lisa Bartoli, Luciana Carota, Raffaele Fronza, G. Maggi, Rita Casadio, Pier Luigi Martelli, Bartoli L., Montanucci L., Fronza R., Martelli P.L., Fariselli P., Carota L., Donvito G., Maggi G., and Casadio R.
- Subjects
Alignment coverage ,Cross-genome comparison ,Grid technology ,Protein functional annotation ,PROTEIN FUNCTIONAL ANNOTATION ,Computer science ,Munich Information Center for Protein Sequences ,Vertebrate and Genome Annotation Project ,CROSS-GENOME COMPARISON ,ALIGNMENT COVERAGE ,GRID TECHNOLOGY ,computer.software_genre ,Biochemistry ,Genome ,Structural genomics ,Annotation ,Protein Annotation ,Sequence Analysis, Protein ,Pongo pygmaeus ,Terminology as Topic ,Databases, Genetic ,Protein Interaction Mapping ,Animals ,Cluster Analysis ,Critical Assessment of Function Annotation ,Computational Biology ,Proteins ,Reproducibility of Results ,General Chemistry ,Genomics ,Hierarchical clustering ,Data mining ,computer ,Sequence Alignment - Abstract
Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).
- Published
- 2009
21. Functional annotations improve the predictive score of human disease-related mutations in proteins
- Author
-
Pier Luigi Martelli, Rita Casadio, Remo Calabrese, Emidio Capriotti, Piero Fariselli, Calabrese R., Capriotti E., Fariselli P., Martelli P.L., and Casadio R.
- Subjects
Genetics ,Nonsynonymous substitution ,MISSENSE MUTATION ,SUPPORT VECTOR MACHINE ,GENE ONTOLOGY ,DISEASE-RELATED SNP ,Gene ontology ,Proteins ,Single-nucleotide polymorphism ,Disease ,Syndrome ,Biology ,Matthews correlation coefficient ,Protein sequencing ,Genetic marker ,Mutation ,Missense mutation ,Humans ,Abnormalities, Multiple ,Genetics (clinical) - Abstract
Single nucleotide polymorphisms (SNPs) are the simplest and most frequent form of human DNA variation, also valuable as genetic markers of disease susceptibility. The most investigated SNPs are missense mutations resulting in residue substitutions in the protein. Here we propose SNPs&GO, an accurate method that, starting from a protein sequence, can predict whether a mutation is disease related or not by exploiting the protein functional annotation. The scoring efficiency of SNPs&GO is as high as 82%, with a Matthews correlation coefficient equal to 0.63 over a wide set of annotated nonsynonymous mutations in proteins, including 16,330 disease-related and 17,432 neutral polymorphisms. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods. Hum Mutat 30:1–8, 2009. © 2009 Wiley-Liss, Inc.
- Published
- 2009
22. Progress and challenges in predicting protein-protein interaction sites
- Author
-
Alfonso Valencia, Piero Fariselli, Rita Casadio, Michael L. Tress, Iakes Ezkurdia, Lisa Bartoli, Ezkurdia I., Bartoli L., Fariselli P., Casadio R., Valencia A., and Tress M.L.
- Subjects
Interface (Java) ,Computer science ,PREDICTION ,Protein Conformation ,Surface Properties ,Static Electricity ,Feature selection ,Machine learning ,computer.software_genre ,Field (computer science) ,Protein–protein interaction ,PROTEIN COMPLEXES ,Protein structure ,Protein Interaction Mapping ,Set (psychology) ,PROTEIN-PROTEIN INTERACTION ,BINDING SITES ,MACHINE LEARNING ,Databases, Protein ,Molecular Biology ,Binding Sites ,business.industry ,Proteins ,Identification (information) ,Multiprotein Complexes ,Artificial intelligence ,business ,computer ,Protein network ,Algorithms ,Information Systems - Abstract
The identification of protein-protein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
- Published
- 2009
23. Prediction of Structurally-Determined Coiled-Coil Domains with Hidden Markov Models
- Author
-
Rita Casadio, Piero Fariselli, Daniele Molinini, Anders Krogh, Fariselli P., Molinini D., Casadio R., and Krogh A.
- Subjects
Coiled coil ,business.industry ,Simple Modular Architecture Research Tool ,HIDDEN MARKOV MODELS ,Protein domain ,Pattern recognition ,Biology ,Protein structure prediction ,Machine learning ,computer.software_genre ,COILED-COIL DOMAINS ,PROTEIN STRUCTURE PREDICTION ,Protein sequencing ,Protein structure ,Artificial intelligence ,business ,Hidden Markov model ,Structural motif ,computer - Abstract
The coiled-coil protein domain is a widespread structural motif known to be involved in a wealth of key interactions in cells and organisms. Coiled-coil recognition and prediction of their location in a protein sequence are important steps for modeling protein structure and function. Nowadays, thanks to the increasing number of experimentally determined protein structures, a significant number of coiled-coil protein domains is available. This enables the development of methods suited to predict the coiled-coil structural motifs starting from the protein sequence. Several methods have been developed to predict classical heptads using manually annotated coiled-coil domains. In this paper we focus on the prediction structurally-determined coiled-coil segments. We introduce a new method based on hidden Markov models that complement the existing methods and outperforms them in the task of locating structurally-defined coiled-coil segments.
- Published
- 2007
24. A three-state prediction of single point mutations on protein stability changes
- Author
-
Emidio Capriotti, Ivan Rossi, Rita Casadio, Piero Fariselli, Capriotti E., Fariselli P., Rossi I., and Casadio R.
- Subjects
Models, Molecular ,Molecular Sequence Data ,Thermodynamics ,Biology ,lcsh:Computer applications to medicine. Medical informatics ,Quantitative Biology - Quantitative Methods ,Biochemistry ,Stability (probability) ,Standard deviation ,Structure-Activity Relationship ,Protein structure ,Drug Stability ,Structural Biology ,Protein methods ,Sequence Analysis, Protein ,Computer Simulation ,Amino Acid Sequence ,Molecular Biology ,MUTATION ,lcsh:QH301-705.5 ,SUPPORT VECTOR MACHINE ,PROTEIN STABILITY ,SNPS ,Quantitative Methods (q-bio.QM) ,Genetics ,Applied Mathematics ,Point mutation ,Research ,Proteins ,Biomolecules (q-bio.BM) ,Computer Science Applications ,Quantitative Biology - Biomolecules ,Amino Acid Substitution ,Models, Chemical ,lcsh:Biology (General) ,FOS: Biological sciences ,Mutation (genetic algorithm) ,Mutagenesis, Site-Directed ,lcsh:R858-859.7 ,Sign (mathematics) ,Neutral mutation - Abstract
A basic question of protein structural studies is to which extent mutations affect the stability. This question may be addressed starting from sequence and/or from structure. In proteomics and genomics studies prediction of protein stability free energy change (DDG) upon single point mutation may also help the annotation process. The experimental SSG values are affected by uncertainty as measured by standard deviations. Most of the DDG values are nearly zero (about 32% of the DDG data set ranges from -0.5 to 0.5 Kcal/mol) and both the value and sign of DDG may be either positive or negative for the same mutation blurring the relationship among mutations and expected DDG value. In order to overcome this problem we describe a new predictor that discriminates between 3 mutation classes: destabilizing mutations (DDG0.5 Kcal/mol) and neutral mutations (-0.5, Comment: Text: 9 pages, Figures: 9 pages, Tables: 1 page, Supplemetary Material: 1 page
- Published
- 2007
- Full Text
- View/download PDF
25. BaCelLo: a balanced subcellular localization predictor
- Author
-
Piero Fariselli, Rita Casadio, Pier Luigi Martelli, Andrea Pierleoni, Pierleoni A., Martelli P.L., Fariselli P., Casadio R., and P. BOURNE, S. BRUNAK
- Subjects
Statistics and Probability ,PREDICTION ,Saccharomyces cerevisiae ,Decision tree ,PROTEIN SORTING ,Computational biology ,Biology ,medicine.disease_cause ,Bioinformatics ,Biochemistry ,Models, Biological ,Pattern Recognition, Automated ,Structure-Activity Relationship ,EUKARYOTIC CELL ,Artificial Intelligence ,Sequence Analysis, Protein ,Protein targeting ,medicine ,Arabidopsis thaliana ,Animals ,Humans ,Computer Simulation ,Molecular Biology ,Caenorhabditis elegans ,General Environmental Science ,BIOINFORMATICS ,Proteins ,SUBCELLULAR LOCALIZATION ,biology.organism_classification ,Subcellular localization ,Computer Science Applications ,SUPPORT VECTOR MACHINES ,Computational Mathematics ,Computational Theory and Mathematics ,SUPPORT VECTOR MACHINE ,Cytoplasm ,Proteome ,General Earth and Planetary Sciences ,Function (biology) ,Algorithms ,Software ,Subcellular Fractions - Abstract
Compartmentalization plays a major role in eukaryotic cells by making possible the fine regulation of complex biochemical pathways. Each protein needs the right biochemical context to operate, therefore the knowledge of the subcellular localization of a protein is essential in order to understand its functions and its pattern of interactions in protein networks. BaCelLo is a predictor for the subcellular localization of eukaryotic proteins and it is based on several Support Vector Machines (SVMs) arranged in a decision tree (Fig 1). Starting from the residue sequence, BaCelLo discriminates five different localizations: secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast. The predictor analyzes the protein residue sequence and its evolutionary profile considering information from the whole sequence and from its N- and C-terminal regions. Three different predictors are available for three different eukaryotic kingdoms: Metazoa, Viridiplantae and Fungi. The distinctive features of BaCelLo are: 1. a homology-reduced dataset for training and testing the predictor, in order to avoid redundancy. This dataset was compiled starting from the Swissprot data base (release 48) and contains proteins whose subcellular localization was experimentally annotated. The dataset was reduced by similarity so that no protein in the dataset share more than 30% identity; 2. the implementation of three kingdom-specific predictors to take into account differences in subcellular localization mechanisms; 3. the evolutionary profile to extract evolutionary information from the residue sequence. 4. a hierarchic tree for the predictions; 5. the introduction of a unique balancing procedure in SVMs that corrects the biases between the different classes due to the disproportions in the training set . BaCelLo proved to outperform all the other state-of-art methods publicly available, when validated on a set of protein sequences independent of the training set.
- Published
- 2006
26. Reconstruction of 3D structures from protein contact maps
- Author
-
Pietro Di Lena, Rita Casadio, Filippo Medri, Marco Vassura, Luciano Margara, Piero Fariselli, Vassura M., Margara L., Di Lena P., Medri F., Fariselli P., and Casadio R.
- Subjects
Models, Molecular ,Protein Folding ,Theoretical computer science ,Computational complexity theory ,CONTACT MAP ,Computer science ,Protein Conformation ,COMBINATORIAL OPTIMIZATION ,Combinatorics ,Set (abstract data type) ,PROTEIN STRUCTURE PREDICTION ,Structural bioinformatics ,Protein structure ,Protein Interaction Mapping ,Genetics ,Logical matrix ,Computer Simulation ,MOLECULAR MODELING ,RECONSTRUCTION ALGORITHM ,Mathematics ,Binding Sites ,Heuristic ,Applied Mathematics ,Protein primary structure ,Proteins ,PROTEIN STRUCTURE ,Graph theory ,Function (mathematics) ,Protein structure prediction ,Protein tertiary structure ,Range (mathematics) ,Models, Chemical ,Combinatorial optimization ,Protein folding ,Algorithm ,Biotechnology ,Protein Binding - Abstract
Proteins are large organic compounds made of amino acids arranged in a linear chain (primary structure). Most proteins fold into unique three-dimensional (3D) structures called interchangeably tertiary, folded, or native structures. Discovering the tertiary structure of a protein (Protein Folding Problem) can provide important clues about how the protein performs its function and it is one of the most important problems in Bioinformatics. A contact map of a given protein P is a binary matrix M such that Mi,j = 1 iff the physical distance between amino acids i and j in the native structure is less than or equal to a pre-assigned threshold t. The contact map of each protein is a distinctive signature of its folded structure. Predicting the tertiary structure of a protein directly from its primary structure is a very complex and still unsolved problem. An alternative and probably more feasible approach is to predict the contact map of a protein from its primary structure and then to compute the tertiary structure starting from the predicted contact map. This last problem has been recently proven to be NP-Hard [6]. In this paper we give a heuristic method that is able to reconstruct in a few seconds a 3D model that exactly matches the target contact map. We wish to emphasize that our method computes an exact model for the protein independently of the contact map threshold. To our knowledge, our method outperforms all other techniques in the literature [5,10,17,19] both for the quality of the provided solutions and for the running times. Our experimental results are obtained on a non-redundant data set consisting of 1760 proteins which is by far the largest benchmark set used so far. Average running times range from 3 to 15 seconds depending on the contact map threshold and on the size of the protein. Repeated applications of our method (starting from randomly chosen distinct initial solutions) show that the same contact map may admit (depending on the threshold) quite different 3D models. Extensive experimental results show that contact map thresholds ranging from 10 to 18 Angstrom allow to reconstruct 3D models that are very similar to the proteins native structure. Our Heuristic is freely available for testing on the web at the following url: http://vassura.web.cs.unibo.it/cmap23d/
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.