23 results on '"Radivojac P"'
Search Results
2. Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project
- Author
-
Stenton, Sarah L., O’Leary, Melanie C., Lemire, Gabrielle, VanNoy, Grace E., DiTroia, Stephanie, Ganesh, Vijay S., Groopman, Emily, O’Heir, Emily, Mangilog, Brian, Osei-Owusu, Ikeoluwa, Pais, Lynn S., Serrano, Jillian, Singer-Berk, Moriel, Weisburd, Ben, Wilson, Michael W., Austin-Tse, Christina, Abdelhakim, Marwa, Althagafi, Azza, Babbi, Giulia, Bellazzi, Riccardo, Bovo, Samuele, Carta, Maria Giulia, Casadio, Rita, Coenen, Pieter-Jan, De Paoli, Federica, Floris, Matteo, Gajapathy, Manavalan, Hoehndorf, Robert, Jacobsen, Julius O. B., Joseph, Thomas, Kamandula, Akash, Katsonis, Panagiotis, Kint, Cyrielle, Lichtarge, Olivier, Limongelli, Ivan, Lu, Yulan, Magni, Paolo, Mamidi, Tarun Karthik Kumar, Martelli, Pier Luigi, Mulargia, Marta, Nicora, Giovanna, Nykamp, Keith, Pejaver, Vikas, Peng, Yisu, Pham, Thi Hong Cam, Podda, Maurizio S., Rao, Aditya, Rizzo, Ettore, Saipradeep, Vangala G., Savojardo, Castrense, Schols, Peter, Shen, Yang, Sivadasan, Naveen, Smedley, Damian, Soru, Dorian, Srinivasan, Rajgopal, Sun, Yuanfei, Sunderam, Uma, Tan, Wuwei, Tiwari, Naina, Wang, Xiao, Wang, Yaqiong, Williams, Amanda, Worthey, Elizabeth A., Yin, Rujie, You, Yuning, Zeiberg, Daniel, Zucca, Susanna, Bakolitsa, Constantina, Brenner, Steven E., Fullerton, Stephanie M., Radivojac, Predrag, Rehm, Heidi L., and O’Donnell-Luria, Anne
- Abstract
Background: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in transwith a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. Conclusions: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
- Published
- 2024
- Full Text
- View/download PDF
3. Advancing remote homology detection: A step toward understanding and accurately predicting protein function
- Author
-
Radivojac, Predrag
- Abstract
Identifying homologous proteins with divergent amino acid sequences can add to our understanding of protein evolution, structure, and function. A new study reports the development of a deep-network-based method to identify 6.8 million new Pfam members, a dramatic singular increase that exceeds a decade of accumulation using traditional approaches.
- Published
- 2022
- Full Text
- View/download PDF
4. Paternal age in rhesus macaques is positively associated with germline mutation accumulation but not with measures of offspring sociability
- Author
-
Wang, Richard J., Thomas, Gregg W.C., Raveendran, Muthuswamy, Harris, R. Alan, Doddapaneni, Harshavardhan, Muzny, Donna M., Capitanio, John P., Radivojac, Predrag, Rogers, Jeffrey, and Hahn, Matthew W.
- Abstract
Mutation is the ultimate source of all genetic novelty and the cause of heritable genetic disorders. Mutational burden has been linked to complex disease, including neurodevelopmental disorders such as schizophrenia and autism. The rate of mutation is a fundamental genomic parameter and direct estimates of this parameter have been enabled by accurate comparisons of whole-genome sequences between parents and offspring. Studies in humans have revealed that the paternal age at conception explains most of the variation in mutation rate: Each additional year of paternal age in humans leads to approximately 1.5 additional inherited mutations. Here, we present an estimate of the de novo mutation rate in the rhesus macaque (Macaca mulatta) using whole-genome sequence data from 32 individuals in four large pedigrees. We estimated an average mutation rate of 0.58 × 10−8per base pair per generation (at an average parental age of 7.5 yr), much lower than found in direct estimates from great apes. As in humans, older macaque fathers transmit more mutations to their offspring, increasing the per generation mutation rate by 4.27 × 10−10per base pair per year. We found that the rate of mutation accumulation after puberty is similar between macaques and humans, but that a smaller number of mutations accumulate before puberty in macaques. We additionally investigated the role of paternal age on offspring sociability, a proxy for normal neurodevelopment, by studying 203 male macaques in large social groups.
- Published
- 2020
- Full Text
- View/download PDF
5. Assessment of the evidence yield for the calibrated PP3/BP4 computational recommendations
- Author
-
Stenton, Sarah L., Pejaver, Vikas, Bergquist, Timothy, Biesecker, Leslie G., Byrne, Alicia B., Nadeau, Emily A.W., Greenblatt, Marc S., Harrison, Steven M., Tavtigian, Sean V., Radivojac, Predrag, Brenner, Steven E., O’Donnell-Luria, Anne, Biesecker, Leslie G., Harrison, Steven M., Tayoun, Ahmad A., Berg, Jonathan S., Brenner, Steven E., Cutting, Garry R., Ellard, Sian, Greenblatt, Marc S., Kang, Peter, Karbassi, Izabela, Karchin, Rachel, Mester, Jessica, O’Donnell-Luria, Anne, Pesaran, Tina, Plon, Sharon E., Rehm, Heidi L., Strande, Natasha T., Tavtigian, Sean V., and Topper, Scott
- Abstract
: To investigate the number of rare missense variants observed in human genome sequences by ACMG/AMP PP3/BP4 evidence strength, following the calibrated PP3/BP4 computational recommendations.
- Published
- 2024
- Full Text
- View/download PDF
6. Searching and visualizing genetic associations of pregnancy traits by using GnuMoM2b
- Author
-
Yan, Qi, Guerrero, Rafael F, Khan, Raiyan R, Surujnarine, Andy A, Wapner, Ronald J, Hahn, Matthew W, Raja, Anita, Salleb-Aouissi, Ansaf, Grobman, William A, Simhan, Hyagriv, Blue, Nathan R, Silver, Robert, Chung, Judith H, Reddy, Uma M, Radivojac, Predrag, Pe’er, Itsik, and Haas, David M
- Abstract
Adverse pregnancy outcomes (APOs) are major risk factors for women's health during pregnancy and even in the years after pregnancy. Due to the heterogeneity of APOs, only few genetic associations have been identified. In this report, we conducted genome-wide association studies (GWASs) of 479 traits that are possibly related to APOs using a large and racially diverse study, Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b). To display extensive results, we developed a web-based tool GnuMoM2b (https://gnumom2b.cumcobgyn.org/) for searching, visualizing, and sharing results from a GWAS of 479 pregnancy traits as well as phenome-wide association studies of more than 17 million single nucleotide polymorphisms. The genetic results from 3 ancestries (Europeans, Africans, and Admixed Americans) and meta-analyses are populated in GnuMoM2b. In conclusion, GnuMoM2b is a valuable resource for extraction of pregnancy-related genetic results and shows the potential to facilitate meaningful discoveries.
- Published
- 2023
- Full Text
- View/download PDF
7. Physicochemical sequence characteristics that influence S-palmitoylation propensity
- Author
-
Reddy, Krishna D., Malipeddi, Jashwanth, DeForte, Shelly, Pejaver, Vikas, Radivojac, Predrag, Uversky, Vladimir N., and Deschenes, Robert J.
- Abstract
Over the past 30 years, several hundred eukaryotic proteins spanning from yeast to man have been shown to be S-palmitoylated. This post-translational modification involves the reversible addition of a 16-carbon saturated fatty acyl chain onto the cysteine residue of a protein where it regulates protein membrane association and distribution, conformation, and stability. However, the large-scale proteome-wide discovery of new palmitoylated proteins has been hindered by the difficulty of identifying a palmitoylation consensus sequence. Using a bioinformatics approach, we show that the enrichment of hydrophobic and basic residues, the cellular context of the protein, and the structural features of the residues surrounding the palmitoylated cysteine all influence the likelihood of palmitoylation. We developed a new palmitoylation predictor that incorporates these identified features, and this predictor achieves a Matthews Correlation Coefficient of .74 using 10-fold cross validation, and significantly outperforms existing predictors on unbiased testing sets. This demonstrates that palmitoylation sites can be predicted with accuracy by taking into account not only physiochemical properties of the modified cysteine and its surrounding residues, but also structural parameters and the subcellular localization of the modified cysteine. This will allow for improved predictions of palmitoylated residues in uncharacterized proteins. A web-based version of this predictor is currently under development.
- Published
- 2017
- Full Text
- View/download PDF
8. An expanded evaluation of protein function prediction methods shows an improvement in accuracy
- Author
-
Jiang, Yuxiang, Oron, Tal Ronnen, Clark, Wyatt T., Bankapur, Asma R., D’Andrea, Daniel, Lepore, Rosalba, Funk, Christopher S., Kahanda, Indika, Verspoor, Karin M., Ben-Hur, Asa, Koo, Da Chen Emily, Penfold-Brown, Duncan, Shasha, Dennis, Youngs, Noah, Bonneau, Richard, Lin, Alexandra, Sahraeian, Sayed M. E., Martelli, Pier Luigi, Profiti, Giuseppe, Casadio, Rita, Cao, Renzhi, Zhong, Zhaolong, Cheng, Jianlin, Altenhoff, Adrian, Skunca, Nives, Dessimoz, Christophe, Dogan, Tunca, Hakala, Kai, Kaewphan, Suwisa, Mehryary, Farrokh, Salakoski, Tapio, Ginter, Filip, Fang, Hai, Smithers, Ben, Oates, Matt, Gough, Julian, Törönen, Petri, Koskinen, Patrik, Holm, Liisa, Chen, Ching-Tai, Hsu, Wen-Lian, Bryson, Kevin, Cozzetto, Domenico, Minneci, Federico, Jones, David T., Chapman, Samuel, BKC, Dukka, Khan, Ishita K., Kihara, Daisuke, Ofer, Dan, Rappoport, Nadav, Stern, Amos, Cibrian-Uhalte, Elena, Denny, Paul, Foulger, Rebecca E., Hieta, Reija, Legge, Duncan, Lovering, Ruth C., Magrane, Michele, Melidoni, Anna N., Mutowo-Meullenet, Prudence, Pichler, Klemens, Shypitsyna, Aleksandra, Li, Biao, Zakeri, Pooya, ElShal, Sarah, Tranchevent, Léon-Charles, Das, Sayoni, Dawson, Natalie L., Lee, David, Lees, Jonathan G., Sillitoe, Ian, Bhat, Prajwal, Nepusz, Tamás, Romero, Alfonso E., Sasidharan, Rajkumar, Yang, Haixuan, Paccanaro, Alberto, Gillis, Jesse, Sedeño-Cortés, Adriana E., Pavlidis, Paul, Feng, Shou, Cejuela, Juan M., Goldberg, Tatyana, Hamp, Tobias, Richter, Lothar, Salamov, Asaf, Gabaldon, Toni, Marcet-Houben, Marina, Supek, Fran, Gong, Qingtian, Ning, Wei, Zhou, Yuanpeng, Tian, Weidong, Falda, Marco, Fontana, Paolo, Lavezzo, Enrico, Toppo, Stefano, Ferrari, Carlo, Giollo, Manuel, Piovesan, Damiano, Tosatto, Silvio C.E., del Pozo, Angela, Fernández, José M., Maietta, Paolo, Valencia, Alfonso, Tress, Michael L., Benso, Alfredo, Di Carlo, Stefano, Politano, Gianfranco, Savino, Alessandro, Rehman, Hafeez Ur, Re, Matteo, Mesiti, Marco, Valentini, Giorgio, Bargsten, Joachim W., van Dijk, Aalt D. J., Gemovic, Branislava, Glisic, Sanja, Perovic, Vladmir, Veljkovic, Veljko, Veljkovic, Nevena, Almeida-e-Silva, Danillo C., Vencio, Ricardo Z. N., Sharan, Malvika, Vogel, Jörg, Kansakar, Lakesh, Zhang, Shanshan, Vucetic, Slobodan, Wang, Zheng, Sternberg, Michael J. E., Wass, Mark N., Huntley, Rachael P., Martin, Maria J., O’Donovan, Claire, Robinson, Peter N., Moreau, Yves, Tramontano, Anna, Babbitt, Patricia C., Brenner, Steven E., Linial, Michal, Orengo, Christine A., Rost, Burkhard, Greene, Casey S., Mooney, Sean D., Friedberg, Iddo, and Radivojac, Predrag
- Abstract
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
- Published
- 2016
- Full Text
- View/download PDF
9. Impact of Amidination on Peptide Fragmentation and Identification in Shotgun Proteomics
- Author
-
Li, Sujun, Dabir, Aditi, Misal, Santosh A., Tang, Haixu, Radivojac, Predrag, and Reilly, James P.
- Abstract
Peptide amidination labeling using S-methyl thioacetimidate (SMTA) is investigated in an attempt to increase the number and types of peptides that can be detected in a bottom-up proteomics experiment. This derivatization method affects the basicity of lysine residues and is shown here to significantly impact the idiosyncracies of peptide fragmentation and peptide detectability. The unique and highly reproducible fragmentation properties of SMTA-labeled peptides, such as the strong propensity for forming b1fragment ions, can be further exploited to modify the scoring of peptide-spectrum pairs and improve peptide identification. To this end, we have developed a supervised postprocessing algorithm to exploit these characteristics of peptides labeled by SMTA. Our experiments show that although the overall number of identifications are similar, the SMTA modification enabled the detection of 16–26% peptides not previously observed in comparable CID/HCD tandem mass spectrometry experiments without SMTA labeling.
- Published
- 2016
- Full Text
- View/download PDF
10. XLSearch: a Probabilistic Database Search Algorithm for Identifying Cross-Linked Peptides
- Author
-
Ji, Chao, Li, Sujun, Reilly, James P., Radivojac, Predrag, and Tang, Haixu
- Abstract
Chemical cross-linking combined with mass spectrometric analysis has become an important technique for probing protein three-dimensional structure and protein–protein interactions. A key step in this process is the accurate identification and validation of cross-linked peptides from tandem mass spectra. The identification of cross-linked peptides, however, presents challenges related to the expanded nature of the search space (all pairs of peptides in a sequence database) and the fact that some peptide-spectrum matches (PSMs) contain one correct and one incorrect peptide but often receive scores that are comparable to those in which both peptides are correctly identified. To address these problems and improve detection of cross-linked peptides, we propose a new database search algorithm, XLSearch, for identifying cross-linked peptides. Our approach is based on a data-driven scoring scheme that independently estimates the probability of correctly identifying each individual peptide in the cross-link given knowledge of the correct or incorrect identification of the other peptide. These conditional probabilities are subsequently used to estimate the joint posterior probability that both peptides are correctly identified. Using the data from two previous cross-link studies, we show the effectiveness of this scoring scheme, particularly in distinguishing between true identifications and those containing one incorrect peptide. We also provide evidence that XLSearch achieves more identifications than two alternative methods at the same false discovery rate (availability: https://github.com/COL-IU/XLSearch).
- Published
- 2016
- Full Text
- View/download PDF
11. Position of Proline Mediates the Reactivity of S-Palmitoylation
- Author
-
Khanal, Neelam, Pejaver, Vikas, Li, Zhiyu, Radivojac, Predrag, Clemmer, David E., and Mukhopadhyay, Suchetana
- Abstract
Palmitoylation, a post-translational modification in which a saturated 16-carbon chain is added predominantly to a cysteine residue, participates in various biological functions. The position of proline relative to other residues being post-translationally modified has been previously reported as being important. We determined that proline is statistically enriched around cysteines known to be S-palmitoylated. The goal of this work was to determine how the position of proline influences the palmitoylation of the cysteine residue. We established a mass spectrometry-based approach to investigate time- and temperature-dependent kinetics of autopalmitoylation in vitroand to derive the thermodynamic parameters of the transition state associated with palmitoylation; to the best of our knowledge, our work is the first to study the kinetics and activation properties of the palmitoylation process. We then used these thermochemical parameters to determine if the position of proline relative to the modified cysteine is important for palmitoylation. Our results show that peptides with proline at the −1 position of cysteine in their sequence (PC) have lower enthalpic barriers and higher entropic barriers in comparison to the same peptides with proline at the +1 position of cysteine (CP); interestingly, the free-energy barriers for both pairs are almost identical. Molecular dynamics studies demonstrate that the flexibility of the cysteine backbone in the PC-containing peptide when compared to the CP-containing peptide explains the increased entropic barrier and decreased enthalpic barrier observed experimentally.
- Published
- 2015
- Full Text
- View/download PDF
12. Generalized graphlet kernels for probabilistic inference in sparse graphs
- Author
-
LUGO-MARTINEZ, JOSE and RADIVOJAC, PREDRAG
- Abstract
AbstractGraph kernels for learning and inference on sparse graphs have been widely studied. However, the problem of designing robust kernel functions that can effectively compare graph neighborhoods in the presence of noisy and complex data remains less explored. Here we propose a novel graph-based kernel method referred to as an edit distance graphlet kernel. The method was designed to add flexibility in capturing similarities between local graph neighborhoods as a means of probabilistically annotating vertices in sparse and labeled graphs. We report experiments on nine real-life data sets from molecular biology and social sciences and provide evidence that the new kernels perform favorably compared to established approaches. However, when both performance accuracy and run time are considered, we suggest that edit distance kernels are best suited for inference on graphs derived from protein structures. Finally, we demonstrate that the new approach facilitates simple and principled ways of integrating domain knowledge into classification and point out that our methodology extends beyond classification; e.g. to applications such as kernel-based clustering of graphs or approximate motif finding. Availability: www.sourceforge.net/projects/graphletkernels/
- Published
- 2014
- Full Text
- View/download PDF
13. MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing
- Author
-
Mort, Matthew, Sterne-Weiler, Timothy, Li, Biao, Ball, Edward, Cooper, David, Radivojac, Predrag, Sanford, Jeremy, and Mooney, Sean
- Abstract
We have developed a novel machine-learning approach, MutPred Splice, for the identification of coding region substitutions that disrupt pre-mRNA splicing. Applying MutPred Splice to human disease-causing exonic mutations suggests that 16% of mutations causing inherited disease and 10 to 14% of somatic mutations in cancer may disrupt pre-mRNA splicing. For inherited disease, the main mechanism responsible for the splicing defect is splice site loss, whereas for cancer the predominant mechanism of splicing disruption is predicted to be exon skipping via loss of exonic splicing enhancers or gain of exonic splicing silencer elements. MutPred Splice is available at http://mutdb.org/mutpredsplice.
- Published
- 2014
- Full Text
- View/download PDF
14. Quantitative Measurement of Phosphoproteome Response to Osmotic Stress in ArabidopsisBased on Library-Assisted eXtracted Ion Chromatogram (LAXIC)*
- Author
-
Xue, Liang, Wang, Pengcheng, Wang, Lianshui, Renzi, Emily, Radivojac, Predrag, Tang, Haixu, Arnold, Randy, Zhu, Jian-Kang, and Tao, W. Andy
- Abstract
Global phosphorylation changes in plants in response to environmental stress have been relatively poorly characterized to date. Here we introduce a novel mass spectrometry-based label-free quantitation method that facilitates systematic profiling plant phosphoproteome changes with high efficiency and accuracy. This method employs synthetic peptide libraries tailored specifically as internal standards for complex phosphopeptide samples and accordingly, a local normalization algorithm, LAXIC, which calculates phosphopeptide abundance normalized locally with co-eluting library peptides. Normalization was achieved in a small time frame centered to each phosphopeptide to compensate for the diverse ion suppression effect across retention time. The label-free LAXIC method was further treated with a linear regression function to accurately measure phosphoproteome responses to osmotic stress in Arabidopsis. Among 2027 unique phosphopeptides identified and 1850 quantified phosphopeptides in Arabidopsissamples, 468 regulated phosphopeptides representing 497 phosphosites have shown significant changes. Several known and novel components in the abiotic stress pathway were identified, illustrating the capability of this method to identify critical signaling events among dynamic and complex phosphorylation. Further assessment of those regulated proteins may help shed light on phosphorylation response to osmotic stress in plants.
- Published
- 2013
- Full Text
- View/download PDF
15. Computational Methods for Identification of Functional Residues in Protein Structures
- Author
-
Xin, Fuxiao and Radivojac, Predrag
- Abstract
The recent accumulation of experimentally determined protein 3D structures combined with our ability to computationally model structure from amino acid sequence has resulted in an increased importance of structure-based methods for protein function prediction. Two types of methods for function prediction have been proposed: those that can accurately predict overall biochemical or biological roles of a protein and those that predict its functional residues. Here, we review approaches used for the computational identification of functional residues in protein structures and summarize their applications to a wide variety of problems in functional proteomics, such as the prediction of catalytic residues, posttranslational modifications, or nucleic acid-binding sites. We examine four different problems in order to perform a comparison between several recently proposed methods and, finally, conclude by identifying limitations and future challenges in this field.
- Published
- 2011
16. Intrinsic Disorder and Functional Proteomics
- Author
-
Radivojac, Predrag, Iakoucheva, Lilia M., Oldfield, Christopher J., Obradovic, Zoran, Uversky, Vladimir N., and Dunker, A. Keith
- Abstract
The recent advances in the prediction of intrinsically disordered proteins and the use of protein disorder prediction in the fields of molecular biology and bioinformatics are reviewed here, especially with regard to protein function. First, a close look is taken at intrinsically disordered proteins and then at the methods used for their experimental characterization. Next, the major statistical properties of disordered regions are summarized, and prediction models developed thus far are described, including their numerous applications in functional proteomics. The future of the prediction of protein disorder and the future uses of such predictions in functional proteomics comprise the last section of this article.
- Published
- 2007
- Full Text
- View/download PDF
17. Predicting intrinsic disorder from amino acid sequence
- Author
-
Obradovic, Zoran, Peng, Kang, Vucetic, Slobodan, Radivojac, Predrag, Brown, Celeste J., and Dunker, A. Keith
- Abstract
Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61‐residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. Proteins 2003;53:566–572. © 2003 Wiley‐Liss, Inc.
- Published
- 2003
- Full Text
- View/download PDF
18. Predicting intrinsic disorder from amino acid sequence
- Author
-
Obradovic, Zoran, Peng, Kang, Vucetic, Slobodan, Radivojac, Predrag, Brown, Celeste J., and Dunker, A. Keith
- Abstract
Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. Proteins 2003;53:566572. © 2003 Wiley-Liss, Inc.
- Published
- 2003
- Full Text
- View/download PDF
19. Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis
- Author
-
Wang, Richard J, Radivojac, Predrag, and Hahn, Matthew W
- Abstract
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.
- Published
- 2021
- Full Text
- View/download PDF
20. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens
- Author
-
Zhou, Naihui, Jiang, Yuxiang, Bergquist, Timothy R., Lee, Alexandra J., Kacsoh, Balint Z., Crocker, Alex W., Lewis, Kimberley A., Georghiou, George, Nguyen, Huy N., Hamid, Md Nafiz, Davis, Larry, Dogan, Tunca, Atalay, Volkan, Rifaioglu, Ahmet S., Dalkıran, Alperen, Cetin Atalay, Rengul, Zhang, Chengxin, Hurto, Rebecca L., Freddolino, Peter L., Zhang, Yang, Bhat, Prajwal, Supek, Fran, Fernández, José M., Gemovic, Branislava, Perovic, Vladimir R., Davidović, Radoslav S., Sumonja, Neven, Veljkovic, Nevena, Asgari, Ehsaneddin, Mofrad, Mohammad R.K., Profiti, Giuseppe, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Boecker, Florian, Schoof, Heiko, Kahanda, Indika, Thurlby, Natalie, McHardy, Alice C., Renaux, Alexandre, Saidi, Rabie, Gough, Julian, Freitas, Alex A., Antczak, Magdalena, Fabris, Fabio, Wass, Mark N., Hou, Jie, Cheng, Jianlin, Wang, Zheng, Romero, Alfonso E., Paccanaro, Alberto, Yang, Haixuan, Goldberg, Tatyana, Zhao, Chenguang, Holm, Liisa, Törönen, Petri, Medlar, Alan J., Zosa, Elaine, Borukhov, Itamar, Novikov, Ilya, Wilkins, Angela, Lichtarge, Olivier, Chi, Po-Han, Tseng, Wei-Cheng, Linial, Michal, Rose, Peter W., Dessimoz, Christophe, Vidulin, Vedrana, Dzeroski, Saso, Sillitoe, Ian, Das, Sayoni, Lees, Jonathan Gill, Jones, David T., Wan, Cen, Cozzetto, Domenico, Fa, Rui, Torres, Mateo, Warwick Vesztrocy, Alex, Rodriguez, Jose Manuel, Tress, Michael L., Frasca, Marco, Notaro, Marco, Grossi, Giuliano, Petrini, Alessandro, Re, Matteo, Valentini, Giorgio, Mesiti, Marco, Roche, Daniel B., Reeb, Jonas, Ritchie, David W., Aridhi, Sabeur, Alborzi, Seyed Ziaeddin, Devignes, Marie-Dominique, Koo, Da Chen Emily, Bonneau, Richard, Gligorijević, Vladimir, Barot, Meet, Fang, Hai, Toppo, Stefano, Lavezzo, Enrico, Falda, Marco, Berselli, Michele, Tosatto, Silvio C.E., Carraro, Marco, Piovesan, Damiano, Ur Rehman, Hafeez, Mao, Qizhong, Zhang, Shanshan, Vucetic, Slobodan, Black, Gage S., Jo, Dane, Suh, Erica, Dayton, Jonathan B., Larsen, Dallas J., Omdahl, Ashton R., McGuffin, Liam J., Brackenridge, Danielle A., Babbitt, Patricia C., Yunes, Jeffrey M., Fontana, Paolo, Zhang, Feng, Zhu, Shanfeng, You, Ronghui, Zhang, Zihan, Dai, Suyang, Yao, Shuwei, Tian, Weidong, Cao, Renzhi, Chandler, Caleb, Amezola, Miguel, Johnson, Devon, Chang, Jia-Ming, Liao, Wen-Hung, Liu, Yi-Wei, Pascarelli, Stefano, Frank, Yotam, Hoehndorf, Robert, Kulmanov, Maxat, Boudellioua, Imane, Politano, Gianfranco, Di Carlo, Stefano, Benso, Alfredo, Hakala, Kai, Ginter, Filip, Mehryary, Farrokh, Kaewphan, Suwisa, Björne, Jari, Moen, Hans, Tolvanen, Martti E.E., Salakoski, Tapio, Kihara, Daisuke, Jain, Aashish, Šmuc, Tomislav, Altenhoff, Adrian, Ben-Hur, Asa, Rost, Burkhard, Brenner, Steven E., Orengo, Christine A., Jeffery, Constance J., Bosco, Giovanni, Hogan, Deborah A., Martin, Maria J., O’Donovan, Claire, Mooney, Sean D., Greene, Casey S., Radivojac, Predrag, and Friedberg, Iddo
- Abstract
Background: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicansand Pseudomonas aureginosagenomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicansand D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
- Published
- 2019
- Full Text
- View/download PDF
21. A computational approach toward label-free protein quantification using predicted peptide detectability
- Author
-
Tang, Haixu, Arnold, Randy J., Alves, Pedro, Xun, Zhiyin, Clemmer, David E., Novotny, Milos V., Reilly, James P., and Radivojac, Predrag
- Abstract
Summary: We propose here a new concept of peptide detectability which could be an important factor in explaining the relationship between a protein's quantity and the peptides identified from it in a high-throughput proteomics experiment. We define peptide detectability as the probability of observing a peptide in a standard sample analyzed by a standard proteomics routine and argue that it is an intrinsic property of the peptide sequence and neighboring regions in the parent protein. To test this hypothesis we first used publicly available data and data from our own synthetic samples in which quantities of model proteins were controlled. We then applied machine learning approaches to demonstrate that peptide detectability can be predicted from its sequence and the neighboring regions in the parent protein with satisfactory accuracy. The utility of this approach for protein quantification is demonstrated by peptides with higher detectability generally being identified at lower concentrations over those with lower detectability in the synthetic protein mixtures. These results establish a direct link between protein concentration and peptide detectability. We show that for each protein there exists a level of peptide detectability above which peptides are detected and below which peptides are not detected in an experiment. We call this level the minimum acceptable detectability for identified peptides (MDIP) which can be calibrated to predict protein concentration. Triplicate analysis of a biological sample showed that these MDIP values are consistent among the three data sets. Contact:
predrag@indiana.edu - Published
- 2006
- Full Text
- View/download PDF
22. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments
- Author
-
Vacic, Vladimir, Iakoucheva, Lilia M., and Radivojac, Predrag
- Abstract
Sumary: Two Sample Logo is a web-based tool that detects and displays statistically significant differences in position-specific symbol compositions between two sets of multiple sequence alignments. In a typical scenario, two groups of aligned sequences will share a common motif but will differ in their functional annotation. The inclusion of the background alignment provides an appropriate underlying amino acid or nucleotide distribution and addresses intersite symbol correlations. In addition, the difference detection process is sensitive to the sizes of the aligned groups. Two Sample Logo extends WebLogo, a widely-used sequence logo generator. The source code is distributed under the MIT Open Source license agreement and is available for download free of charge. Availability:
http://www.twosamplelogo.org Contact:predrag@indiana.edu - Published
- 2006
- Full Text
- View/download PDF
23. DisProt: a database of protein disorder
- Author
-
Vucetic, Slobodan, Obradovic, Zoran, Vacic, Vladimir, Radivojac, Predrag, Peng, Kang, Iakoucheva, Lilia M., Cortese, Marc S., Lawson, J. David, Brown, Celeste J., Sikes, Jason G., Newton, Crystal D., and Dunker, A. Keith
- Abstract
Summary: The Database of Protein Disorder (DisProt) is a curated database that provides structure and function information about proteins that lack a fixed three-dimensional (3D) structure under putatively native conditions, either in their entirety or in part. Starting from the central premise that intrinsic disorder is an important structural class of protein and in order to meet the increasing interest thereof, DisProt is aimed at becoming a central repository of disorder-related information. For each disordered protein, the database includes the name of the protein, various aliases, accession codes, amino acid sequence, location of the disordered region(s), and methods used for structural (disorder) characterization. If applicable, most entries also list the biological function(s) of each disordered region, how each region of disorder is used for function, as well as provide links to PubMed abstracts and major protein databases. Availability:
www.disprot.org Contact:kedunker@iupui.edu - Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.