Author: "Yana Bromberg" / Topic: humans - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yana Bromberg"' showing total 44 results

Start Over Author "Yana Bromberg" Topic humans

44 results on '"Yana Bromberg"'

1. Computational interpretation of human genetic variation

Author: Yana Bromberg and Predrag Radivojac
Subjects: Genome, Human, Databases, Genetic, Genetics, Computational Biology, Genetic Variation, Humans, Genetics (clinical)
Published: 2022

2. Decoding the effects of synonymous variants

Author: Ariel Aptekmann, Zishuo Zeng, and Yana Bromberg
Subjects: Computer science, Genome, Human, AcademicSubjects/SCI00010, RNA Splicing, Evaluation data, Genetic Variation, Proteins, Computational Biology, Variation (game tree), Computational biology, Biology, Unobservable, Set (abstract data type), Machine Learning, Annotation, Narese/24, Genetics, Humans, Disease, Human genome, Extreme gradient boosting, Set (psychology), Decoding methods, Narese/7, Sequence (medicine)
Abstract: Synonymous single nucleotide variants (sSNVs) are common in the human genome but are often overlooked. However, sSNVs can have significant biological impact and may lead to disease. Existing computational methods for evaluating the effect of sSNVs suffer from the lack of gold-standard training/evaluation data and exhibit over-reliance on sequence conservation signals. We developed synVep (synonymous Variant effect predictor), a machine learning-based method that overcomes both of these limitations. Our training data was a combination of variants reported by gnomAD (observed) and those unreported, but possible in the human genome (generated). We used positive-unlabeled learning to purify the generated variant set of any likely unobservable variants. We then trained two sequential extreme gradient boosting models to identify subsets of the remaining variants putatively enriched and depleted in effect. Our method attained 90% precision/recall on a previously unseen set of variants. Furthermore, although synVep does not explicitly use conservation, its scores correlated with evolutionary distances between orthologs in cross-species variation analysis. synVep was also able to differentiate pathogenic vs. benign variants, as well as splice-site disrupting variants (SDV) vs. non-SDVs. Thus, synVep provides an important improvement in annotation of sSNVs, allowing users to focus on variants that most likely harbor effects. Availability synVep webserver for online query: https://services.bromberglab.org/synvep; For local runs Python script (https://bitbucket.org/bromberglab/synvep_local) and prediction database (https://zenodo.org/record/4763256) are also available.
Published: 2021

3. Evolution of the <scp>SARS‐CoV</scp> ‐2 proteome in three dimensions (3D) during the first 6 months of the <scp>COVID</scp> ‐19 pandemic

Author: Charlotte Labrie-Cleary, Jitendra Singh, Steven Arnold, Andrew Sam, Mark Dresel, Luz Helena Alfaro Alvarado, Rebecca Roberts, Emily Fingar, Jennifer Jiang, Paul Craig, Jean Baum, Eddy Arnold, Christine Zardecki, Grace Brannigan, Julia R. Koeppe, Elizabeth M Hennen, Alan Trudeau, Joseph H Lubin, Thejasvi Venkatachalam, Jonathan K. Williams, Kevin Catalfano, Stephen K. Burley, Brian P. Hudson, Isaac Paredes, Sagar D. Khare, Yana Bromberg, Katherine See, Evan Lenkeit, Shuchismita Dutta, J. Steen Hoyer, Erika McCarthy, Michael J. Pikaart, Santiago Soto Zapata, Jenna Currier, Stephanie Laporte, Jay A. Tischfield, Siobain Duffy, Britney Dyszel, Maria Voigt, Changpeng Lu, Bonnie L. Hall, Jesse Sandberg, Kailey Martin, Aaliyah Khan, Stephen A. Mills, Sophia Staggers, Allison Rupert, Elliott M Dolan, Vidur Sarma, Lindsey Whitmore, Helen Zheng, Ashish Duvvuru, David S. Goodsell, Michael Kirsch, Melanie Ortiz-Alvarez de la Campa, Ali A Khan, Matthew Benedek, Francesc X. Ruiz, John D. Westbrook, Marilyn Orellana, Lingjun Xie, Zhuofan Shen, Baleigh Wheeler, and Brea Tinsley
Subjects: Proteome, databases, Viral protein, coronavirus, Computational biology, pandemics, Biology, medicine.disease_cause, Biochemistry, Article, Virus, SARS‐CoV‐2, Protein structure, COVID‐19, Structural Biology, Molecular evolution, evolution, medicine, Humans, Prospective Studies, molecular, Amino Acids, Molecular Biology, Research Articles, chemistry.chemical_classification, SARS-CoV-2, Drug discovery, COVID-19, Robustness (evolution), computer.file_format, Protein Data Bank, Amino acid, viral proteins, chemistry, protein, computer, Function (biology), Research Article
Abstract: Three-dimensional structures of SARS-CoV-2 and other coronaviral proteins archived in the Protein Data Bank were used to analyze viral proteome evolution during the first six months of the COVID-19 pandemic. Analyses of spatial locations, chemical properties, and structural and energetic impacts of the observed amino acid changes in >48,000 viral proteome sequences showed how each one of the 29 viral study proteins have undergone amino acid changes. Structural models computed for every unique sequence variant revealed that most substitutions map to protein surfaces and boundary layers with a minority affecting hydrophobic cores. Conservative changes were observed more frequently in cores versus boundary layers/surfaces. Active sites and protein-protein interfaces showed modest numbers of substitutions. Energetics calculations showed that the impact of substitutions on the thermodynamic stability of the proteome follows a universal bi-Gaussian distribution. Detailed results are presented for six drug discovery targets and four structural proteins comprising the virion, highlighting substitutions with the potential to impact protein structure, enzyme activity, and functional interfaces. Characterizing the evolution of the virus in three dimensions provides testable insights into viral protein function and should aid in structure-based drug discovery efforts as well as the prospective identification of amino acid substitutions with potential for drug resistance.
Published: 2021

4. Virtual Boot Camp: <scp>COVID</scp> ‐19 evolution and structural biology

Author: Christine Zardecki, Paul Craig, Sagar D. Khare, Jennifer Jiang, Siobain Duffy, Stephen K. Burley, Jitendra Singh, Yana Bromberg, Julia R. Koeppe, Jay A. Tischfield, Stephen A. Mills, Shuchismita Dutta, Rebecca Roberts, Bonnie L. Hall, Vidur Sarma, Lingjun Xie, Brian P. Hudson, Michael J. Pikaart, and Joseph H Lubin
Subjects: 2019-20 coronavirus outbreak, Coronavirus disease 2019 (COVID-19), Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Biology, Biochemistry, Education, Distance, Evolution, Molecular, 03 medical and health sciences, Pandemic, Humans, Pandemics, Molecular Biology, Coronavirus 3C Proteases, 030304 developmental biology, Covid‐19, Boot camp, 0303 health sciences, SARS-CoV-2, Extramural, 05 social sciences, COVID-19, Computational Biology, 050301 education, Virology, Structural biology, Curriculum, 0503 education
Published: 2020

5. Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Author: Moses Stamboulian, Rita Casadio, Rajgopal Srinivasan, Emidio Capriotti, Predrag Radivojac, Yana Bromberg, Sadhna Rana, Sean D. Mooney, Castrense Savojardo, Russ B. Altman, Yanran Wang, Panagiostis Katsonis, Steven E. Brenner, Yuxiang Jiang, Roxana Daneshjou, Kymberleigh A. Pagel, Samuele Bovo, John Moult, Gregory McInnes, Lipika R. Pal, Olivier Lichtarge, Pier Luigi Martelli, McInnes, Gregory, Daneshjou, Roxana, Katsonis, Panagiosti, Lichtarge, Olivier, Srinivasan, Raj G, Rana, Sadhna, Radivojac, Predrag, Mooney, Sean D, Pagel, Kymberleigh A, Stamboulian, Mose, Jiang, Yuxiang, Capriotti, Emidio, Wang, Yanran, Bromberg, Yana, Bovo, Samuele, Savojardo, Castrense, Martelli, Pier Luigi, Casadio, Rita, Pal, Lipika R, Moult, John, Brenner, Steven, and Altman, Russ
Subjects: Male, medicine.medical_specialty, venous thromboembolism, Disease, Biology, Genome, Article, 03 medical and health sciences, Exome Sequencing, Genetics, medicine, Cluster Analysis, Humans, Genetic Predisposition to Disease, cardiovascular diseases, Exome, Allele frequency, Genetics (clinical), Exome sequencing, 030304 developmental biology, 0303 health sciences, 030305 genetics & heredity, Confounding, Warfarin, Computational Biology, prediction challenge, Congresses as Topic, equipment and supplies, machine learning, ROC Curve, phenotype prediction, Family medicine, Female, Venous thromboembolism, exome, Unsupervised Machine Learning, medicine.drug
Abstract: Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
Published: 2019

6. Identifying Crohn’s disease signal from variome analysis

Author: Britt-Sabina Petersen, Andre Franke, Stefan Schreiber, Yana Bromberg, Yuri Astrakhan, Maximilian Miller, and Yanran Wang
Subjects: Genetic Markers, 0301 basic medicine, lcsh:QH426-470, lcsh:Medicine, Genome-wide association study, Computational biology, Disease, Biology, Polymorphism, Single Nucleotide, Machine Learning, 03 medical and health sciences, 0302 clinical medicine, Crohn Disease, Genetics, Humans, Exome, Genetic Predisposition to Disease, Molecular Biology, Genetics (clinical), Exome sequencing, Genetic association, Research, lcsh:R, Prognosis, Human genetics, ddc, 3. Good health, lcsh:Genetics, 030104 developmental biology, Variome, Genetic marker, 030220 oncology & carcinogenesis, Metagenome, Molecular Medicine, Genome-Wide Association Study
Abstract: Background After years of concentrated research efforts, the exact cause of Crohn’s disease (CD) remains unknown. Its accurate diagnosis, however, helps in management and preventing the onset of disease. Genome-wide association studies have identified 241 CD loci, but these carry small log odds ratios and are thus diagnostically uninformative. Methods Here, we describe a machine learning method—AVA,Dx (Analysis of Variation for Association with Disease)—that uses exonic variants from whole exome or genome sequencing data to extract CD signal and predict CD status. Using the person-specific coding variation in genes from a panel of only 111 individuals, we built disease-prediction models informative of previously undiscovered disease genes. By additionally accounting for batch effects, we were able to accurately predict CD status for thousands of previously unseen individuals from other panels. Results AVA,Dx highlighted known CD genes including NOD2 and new potential CD genes. AVA,Dx identified 16% (at strict cutoff) of CD patients at 99% precision and 58% of the patients (at default cutoff) with 82% precision in over 3000 individuals from separately sequenced panels. Conclusions Larger training panels and additional features, including other types of genetic variants and environmental factors, e.g., human-associated microbiota, may improve model performance. However, the results presented here already position AVA,Dx as both an effective method for revealing pathogenesis pathways and as a CD risk analysis tool, which can improve clinical diagnostic time and accuracy. Links to the AVA,Dx Docker image and the BitBucket source code are at https://bromberglab.org/project/avadx/.
Published: 2019

7. Inferring Potential Cancer Driving Synonymous Variants

Author: Zishuo Zeng and Yana Bromberg
Subjects: Neoplasms, RNA Splicing, Genetics, Humans, Genomics, Oncogenes, synonymous variants, sSNV, cancer drivers, somatic variants, variant functional impact, Silent Mutation, Genetics (clinical)
Abstract: Synonymous single nucleotide variants (sSNVs) are often considered functionally silent, but a few cases of cancer-causing sSNVs have been reported. From available databases, we collected four categories of sSNVs: germline, somatic in normal tissues, somatic in cancerous tissues, and putative cancer drivers. We found that screening sSNVs for recurrence among patients, conservation of the affected genomic position, and synVep prediction (synVep is a machine learning-based sSNV effect predictor) recovers cancer driver variants (termed proposed drivers) and previously unknown putative cancer genes. Of the 2.9 million somatic sSNVs found in the COSMIC database, we identified 2111 proposed cancer driver sSNVs. Of these, 326 sSNVs could be further tagged for possible RNA splicing effects, RNA structural changes, and affected RBP motifs. This list of proposed cancer driver sSNVs provides computational guidance in prioritizing the experimental evaluation of synonymous mutations found in cancers. Furthermore, our list of novel potential cancer genes, galvanized by synonymous mutations, may highlight yet unexplored cancer mechanisms.
Published: 2022

8. Amino acid encoding for deep learning applications

Author: Yana Bromberg, Hesham ElAbd, Mareike Wendorff, Tobias L. Lenz, Andre Franke, and Adrienne Hoarfrost
Subjects: Theoretical computer science, Computer science, lcsh:Computer applications to medicine. Medical informatics, Deep-learning, Biochemistry, Convolutional neural network, Convoluted-neural network (CNN), Reduction (complexity), 03 medical and health sciences, Matrix (mathematics), Deep Learning, Dimension (vector space), Structural Biology, Human-leukocyte antigen (HLA), Encoding (memory), Humans, HLA-II peptide interaction, Amino Acids, Molecular Biology, lcsh:QH301-705.5, 030304 developmental biology, 0303 health sciences, Training set, business.industry, Applied Mathematics, Deep learning, Methodology Article, 030302 biochemistry & molecular biology, Computational Biology, Computer Science Applications, Recurrent neural network, ComputingMethodologies_PATTERNRECOGNITION, lcsh:Biology (General), Amino acid encoding, Protein-protein interaction (PPI), Embedding, lcsh:R858-859.7, Artificial intelligence, Amino acids embedding, business, Recurrent neural network (RNN), Machine-learning (ML)
Abstract: Background The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the continuous iteration of the model to optimize the target prediction – a process called ‘end-to-end learning’ – has led to state-of-the-art results in many fields. Although usage of embeddings is well described in the bioinformatics literature, the potential of end-to-end learning for single amino acids, as compared to more classical manually-curated encoding strategies, has not been systematically addressed. To this end, we compared classical encoding matrices, namely one-hot, VHSE8 and BLOSUM62, to end-to-end learning of amino acid embeddings for two different prediction tasks using three widely used architectures, namely recurrent neural networks (RNN), convolutional neural networks (CNN), and the hybrid CNN-RNN. Results By using different deep learning architectures, we show that end-to-end learning is on par with classical encodings for embeddings of the same dimension even when limited training data is available, and might allow for a reduction in the embedding dimension without performance loss, which is critical when deploying the models to devices with limited computational capacities. We found that the embedding dimension is a major factor in controlling the model performance. Surprisingly, we observed that deep learning models are capable of learning from random vectors of appropriate dimension. Conclusion Our study shows that end-to-end learning is a flexible and powerful method for amino acid encoding. Further, due to the flexibility of deep learning systems, amino acid encoding schemes should be benchmarked against random vectors of the same dimension to disentangle the information content provided by the encoding scheme from the distinguishability effect provided by the scheme.
Published: 2020

9. De Novo Sequence and Copy Number Variants Are Strongly Associated with Tourette Disorder and Implicate Cell Polarity in Pathogenesis

Author: Sheng Wang, Jeffrey D. Mandell, Yogesh Kumar, Nawei Sun, Montana T. Morris, Juan Arbelaez, Cara Nasello, Shan Dong, Clif Duhn, Xin Zhao, Zhiyu Yang, Shanmukha S. Padmanabhuni, Dongmei Yu, Robert A. King, Andrea Dietrich, Najah Khalifa, Niklas Dahl, Alden Y. Huang, Benjamin M. Neale, Giovanni Coppola, Carol A. Mathews, Jeremiah M. Scharf, Thomas V. Fernandez, Joseph D. Buxbaum, Silvia De Rubeis, Dorothy E. Grice, Jinchuan Xing, Gary A. Heiman, Jay A. Tischfield, Peristera Paschou, A. Jeremy Willsey, Matthew W. State, Mohamed Abdulkadir, Benjamin Bodmer, Yana Bromberg, Lawrence W. Brown, Keun-Ah Cheon, Barbara J. Coffey, Li Deng, Lonneke Elzerman, Carolin Fremer, Blanca Garcia-Delgar, Donald L. Gilbert, Julie Hagstrøm, Tammy Hedderly, Isobel Heyman, Pieter J. Hoekstra, Hyun Ju Hong, Chaim Huyser, Eun-Joo Kim, Young Key Kim, Young-Shin Kim, Yun-Joo Koh, Sodahm Kook, Samuel Kuperman, Bennett L Leventhal, Andrea G. Ludolph, Marcos Madruga-Garrido, Athanasios Maras, Pablo Mir, Astrid Morer, Montana T Morris, Kirsten Müller-Vahl, Alexander Münchau, Tara L. Murphy, Kerstin J. Plessen, Hannah Poisner, Veit Roessner, Stephan J. Sanders, Eun-Young Shin, Dong-Ho Song, Jungeun Song, Joshua K. Thackray, Jennifer Tübing, Frank Visscher, Sina Wanderer, A Jeremy Willsey, Martin Woods, Yeting Zhang, Samuel H. Zinner, Christos Androutsos, Csaba Barta, Luca Farkas, Jakub Fichna, Marianthi Georgitsi, Piotr Janik, Iordanis Karagiannidis, Anastasia Koumoula, Peter Nagy, Joanna Puchala, Renata Rizzo, Natalia Szejko, Urszula Szymanska, Zsanett Tarnok, Vaia Tsironi, Tomasz Wolanczyk, Cezary Zekanowski, Cathy L. Barr, James R. Batterson, Cheston Berlin, Ruth D. Bruun, Cathy L. Budman, Danielle C. Cath, Sylvain Chouinard, Nancy J. Cox, Sabrina Darrow, Lea K. Davis, Yves Dion, Nelson B. Freimer, Marco A. Grados, Matthew E. Hirschtritt, Cornelia Illmann, Roger Kurlan, James F. Leckman, Gholson J. Lyon, Irene A. Malaty, William M. MacMahon, Michael S. Okun, Lisa Osiecki, David L. Pauls, Danielle Posthuma, Vasily Ramensky, Mary M. Robertson, Guy A. Rouleau, Paul Sandor, Harvey S. Singer, Jan Smit, Jae-Hoon Sul, Tourette International Collaborative Genetics Study (TIC Genetics), Tourette Syndrome Genetics Southern and Eastern Europe Initiative (TSGENESEE), Tourette Association of America International Consortium for Genetics (TAAICG), Abdulkadir, M., Arbelaez, J., Bodmer, B., Bromberg, Y., Brown, L.W., Cheon, K.A., Coffey, B.J., Deng, L., Dietrich, A., Dong, S., Duhn, C., Elzerman, L., Fernandez, T.V., Fremer, C., Garcia-Delgar, B., Gilbert, D.L., Grice, D.E., Hagstrøm, J., Hedderly, T., Heiman, G.A., Heyman, I., Hoekstra, P.J., Hong, H.J., Huyser, C., Kim, E.J., Kim, Y.K., Kim, Y.S., King, R.A., Koh, Y.J., Kook, S., Kuperman, S., Leventhal, B.L., Ludolph, A.G., Madruga-Garrido, M., Mandell, J.D., Maras, A., Mir, P., Morer, A., Morris, M.T., Müller-Vahl, K., Münchau, A., Murphy, T.L., Nasello, C., Plessen, K.J., Poisner, H., Roessner, V., Sanders, S.J., Shin, E.Y., Song, D.H., Song, J., State, M.W., Sun, N., Thackray, J.K., Tischfield, J.A., Tübing, J., Visscher, F., Wanderer, S., Wang, S., Willsey, A.J., Woods, M., Xing, J., Zhang, Y., Zhao, X., Zinner, S.H., Androutsos, C., Barta, C., Farkas, L., Fichna, J., Georgitsi, M., Janik, P., Karagiannidis, I., Koumoula, A., Nagy, P., Paschou, P., Puchala, J., Rizzo, R., Szejko, N., Szymanska, U., Tarnok, Z., Tsironi, V., Wolanczyk, T., Zekanowski, C., Barr, C.L., Batterson, J.R., Berlin, C., Bruun, R.D., Budman, C.L., Cath, D.C., Chouinard, S., Coppola, G., Cox, N.J., Darrow, S., Davis, L.K., Dion, Y., Freimer, N.B., Grados, M.A., Hirschtritt, M.E., Huang, A.Y., Illmann, C., Kurlan, R., Leckman, J.F., Lyon, G.J., Malaty, I.A., Mathews, C.A., MacMahon, W.M., Neale, B.M., Okun, M.S., Osiecki, L., Pauls, D.L., Posthuma, D., Ramensky, V., Robertson, M.M., Rouleau, G.A., Sandor, P., Scharf, J.M., Singer, H.S., Smit, J., Sul, J.H., and Yu, D.
Subjects: Adult, Male, 0301 basic medicine, medicine.medical_specialty, DNA Copy Number Variations, Receptors, Cell Surface, Biology, Genome, Article, General Biochemistry, Genetics and Molecular Biology, 03 medical and health sciences, 0302 clinical medicine, RARE, SCHIZOPHRENIA, medicine, Humans, Copy-number variation, Child, NEURODEVELOPMENTAL DISORDERS, Gene, lcsh:QH301-705.5, Exome sequencing, 030304 developmental biology, Medicinsk genetik, Sequence (medicine), Genetics, 0303 health sciences, SEVERE INTELLECTUAL DISABILITY, Cadherin, MUTATIONS, AUTISM SPECTRUM DISORDER, Cell Polarity, OBSESSIVE-COMPULSIVE DISORDER, Cadherins, medicine.disease, Pedigree, PREVALENCE, CONGENITAL HEART-DISEASE, GENOME, 030104 developmental biology, lcsh:Biology (General), Schizophrenia, Medical genetics, Female, Cadherins/genetics, Receptors, Cell Surface/genetics, Tourette Syndrome/genetics, Tourette Syndrome/pathology, TIC Genetics, Tourette disorder, cell polarity, copy number variants, de novo variants, gene discovery, microarray genotyping, multiplex, simplex, whole exome sequencing, Medical Genetics, 030217 neurology & neurosurgery, Tourette Syndrome
Abstract: SUMMARY We previously established the contribution of de novo damaging sequence variants to Tourette disorder (TD) through whole-exome sequencing of 511 trios. Here, we sequence an additional 291 TD trios and analyze the combined set of 802 trios. We observe an overrepresentation of de novo damaging variants in simplex, but not multiplex, families; we identify a high-confidence TD risk gene, CELSR3 (cadherin EGF LAG seven-pass G-type receptor 3); we find that the genes mutated in TD patients are enriched for those related to cell polarity, suggesting a common pathway underlying pathobiology; and we confirm a statistically significant excess of de novo copy number variants in TD. Finally, we identify significant overlap of de novo sequence variants between TD and obsessive-compulsive disorder and de novo copy number variants between TD and autism spectrum disorder, consistent with shared genetic risk., In Brief Wang et al. expand their earlier exome-sequencing work in TD, adding 291 trios and conducting combined analyses suggesting de novo variants carry more risk in individuals with unaffected parents, establishing de novo structural variants as risk factors, identifying CELSR3 as a risk gene, and implicating cell polarity in pathogenesis., Graphical Abstract
Published: 2018

10. Assessing the performance of in-silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer

Author: Rita Casadio, Panagiotis Katsonis, Susan L. Neuhausen, Alin Voskanian, Predrag Radivojac, Yao Yu, Steven E. Brenner, Yue Cao, Yana Bromberg, Yuanfei Sun, Erin Young, Giulia Babbi, Elad Ziv, Castrense Savojardo, Maricel G. Kann, Max Miller, Yanran Wang, Olivier Lichtarge, Aditi Garg, Pier Luigi Martelli, Yang Shen, Emidio Capriotti, Debnath Pal, Gaia Andreoletti, Sean V. Tavtigian, Sean D. Mooney, Vikas Pejaver, Lipika R. Pal, Chad D. Huff, Voskanian A., Katsonis P., Lichtarge O., Pejaver V., Radivojac P., Mooney S.D., Capriotti E., Bromberg Y., Wang Y., Miller M., Martelli P.L., Savojardo C., Babbi G., Casadio R., Cao Y., Sun Y., Shen Y., Garg A., Pal D., Yu Y., Huff C.D., Tavtigian S.V., Young E., Neuhausen S.L., Ziv E., Pal L.R., Andreoletti G., Brenner S.E., and Kann M.G.
Subjects: Adult, In silico, Breast Neoplasms, Computational biology, Disease, Biology, Genome, Polymorphism, Single Nucleotide, Article, Odds, 03 medical and health sciences, breast cancer, Breast cancer, SNV, Exome Sequencing, Genetics, medicine, Humans, Computer Simulation, Genetic Predisposition to Disease, CHEK2, Genetics (clinical), 030304 developmental biology, Aged, 0303 health sciences, 030305 genetics & heredity, Computational Biology, Hispanic or Latino, Hispanic women, Middle Aged, medicine.disease, Precision medicine, United States, Checkpoint Kinase 2, Case-Control Studies, Linear Models, CAGI, Identification (biology), Female
Abstract: The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.
Published: 2019

11. Performance of computational methods for the evaluation of Pericentriolar Material 1 missense variants in CAGI-5

Author: Marco Carraro, Maria Kousi, Yanran Wang, Rita Casadio, Pier Luigi Martelli, Castrense Savojardo, Emidio Capriotti, Luigi Chiricosta, Giulia Babbi, Alexander Miguel Monzon, Steven E. Brenner, James Han, Panagiotis Katsonis, Kivilcim Ozturk, Nicholas Katsanis, Emanuela Leonardi, Olivier Lichtarge, Gaia Andreoletti, Hannah Carter, Silvio C. E. Tosatto, John Moult, Carlo Ferrari, Maximilian Miller, Francesco Reggiani, Yana Bromberg, Monzon A.M., Carraro M., Chiricosta L., Reggiani F., Han J., Ozturk K., Wang Y., Miller M., Bromberg Y., Capriotti E., Savojardo C., Babbi G., Martelli P.L., Casadio R., Katsonis P., Lichtarge O., Carter H., Kousi M., Katsanis N., Andreoletti G., Moult J., Brenner S.E., Ferrari C., Leonardi E., and Tosatto S.C.E.
Subjects: bioinformatics tools, community challenge, critical assessment, effect prediction, missense mutations, variant interpretation, Cell Cycle Proteins, Autoantigens, Databases, Genetic, 2.1 Biological and endogenous factors, Missense mutation, Aetiology, Genetics (clinical), Pericentriolar material, Genetics & Heredity, 0303 health sciences, 030305 genetics & heredity, Single Nucleotide, Mental Health, Phenotype, Mutation (genetic algorithm), Critical assessment, Neural Networks, Clinical Sciences, Mutation, Missense, Single-nucleotide polymorphism, Computational biology, Biology, Polymorphism, Single Nucleotide, Article, Databases, Computer, 03 medical and health sciences, Genetic, Genetics, Humans, Genetic Predisposition to Disease, Polymorphism, Clinical phenotype, Gene, Loss function, 030304 developmental biology, missense mutation, Computational Biology, Brain Disorders, Mutation, bioinformatics tool, Schizophrenia, Neural Networks, Computer, Missense
Abstract: The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.
Published: 2019

12. Identifying mutation-driven changes in gene functionality that lead to venous thromboembolism

Author: Yanran Wang and Yana Bromberg
Subjects: Male, Population, MEDLINE, Biology, Bioinformatics, Polymorphism, Single Nucleotide, Article, 03 medical and health sciences, Polymorphism (computer science), Exome Sequencing, Genetics, medicine, Humans, Genetic Predisposition to Disease, cardiovascular diseases, education, Exome, Genetics (clinical), Exome sequencing, Genetic Association Studies, 030304 developmental biology, Genetic association, 0303 health sciences, education.field_of_study, Principal Component Analysis, 030305 genetics & heredity, Case-control study, Warfarin, Computational Biology, Venous Thromboembolism, equipment and supplies, United States, Black or African American, Case-Control Studies, Female, medicine.drug
Abstract: Venous thromboembolism (VTE) is a common hematological disorder. VTE affects millions of people around the world each year and can be fatal. Earlier studies have revealed the possible VTE genetic risk factors in Europeans. The 2018 Critical Assessment of Genome Interpretation (CAGI) challenge had asked participants to distinguish between 66 VTE and 37 non-VTE African American (AA) individuals based on their exome sequencing data. We used variants from AA VTE association studies and VTE genes from DisGeNET database to evaluate VTE risk via four different approaches; two of these methods were most successful at the task. Our best performing method represented each exome as a vector of predicted functional effect scores of variants within the known genes. These exome vectors were then clustered with k-means. This approach achieved 70.8% precision and 69.7% recall in identifying VTE patients. Our second-best ranked method had collapsed the variant effect scores into gene-level function changes, using the same vector clustering approach for patient/control identification. These results show predictability of VTE risk in AA population and highlight the importance of variant-driven gene functional changes in judging disease status. Of course, more in-depth understanding of AA VTE pathogenicity is still needed for more precise predictions.
Published: 2019

13. What went wrong with variant effect predictor performance for the PCM1 challenge

Author: Yanran Wang, Maximilian Miller, and Yana Bromberg
Subjects: 0303 health sciences, Pericentriolar Material 1 Protein, 030305 genetics & heredity, Mutation, Missense, Computational Biology, Cell Cycle Proteins, Computational biology, Biology, Genome, Autoantigens, Article, 03 medical and health sciences, PCM1, Databases, Genetic, Genetics, Feature (machine learning), Humans, Critical assessment, Genetic Predisposition to Disease, Genetics (clinical), Algorithms, 030304 developmental biology
Abstract: The recent years have seen a drastic increase in the amount of available genomic sequences. Alongside this explosion, hundreds of computational tools were developed to assess the impact of observed genetic variation. Critical Assessment of Genome Interpretation (CAGI) provides a platform to evaluate the performance of these tools in experimentally relevant contexts. In the CAGI-5 challenge assessing the 38 missense variants affecting the human Pericentriolar Material 1 protein (PCM1), our SNAP-based submission was the top performer, although it did worse than expected from other evaluations. Here, we compare the CAGI-5 submissions, and 24 additional commonly used variant effect predictors, to analyze the reasons for this observation. We identified per residue conservation, structural, and functional PCM1 characteristics, which may be responsible. As expected, predictors had a hard time distinguishing effect variants in non-conserved positions. They were also better able to call effect variants in a structurally rich region than in a less-structured one; in the latter, they more often correctly identified benign than effect variants. Curiously, most of the protein was predicted to be functionally robust to mutation – a feature that likely makes it a harder problem for generalized variant effect predictors.
Published: 2019

14. Assessment of methods for predicting the effects of PTEN and TPMT protein variants

Author: Lukas Folkman, Kunal Kundu, Yaoqi Zhou, Rita Casadio, Olivier Lichtarge, Yana Bromberg, Giulia Babbi, Yizhou Yin, Lipika R. Pal, Panagiotis Katsonis, Castrense Savojardo, Predrag Radivojac, Maximilian Miller, John Moult, Pier Luigi Martelli, Vikas Pejaver, Pejaver V., Babbi G., Casadio R., Folkman L., Katsonis P., Kundu K., Lichtarge O., Martelli P.L., Miller M., Moult J., Pal L.R., Savojardo C., Yin Y., Zhou Y., Radivojac P., and Bromberg Y.
Subjects: Nonsynonymous substitution, VAMP-seq, Computational biology, Article, Stability change, 03 medical and health sciences, Genetics, PTEN, Humans, thiopurine S-methyl transferase, TPMT, Genetics (clinical), 030304 developmental biology, 0303 health sciences, Thiopurine methyltransferase, biology, Protein Stability, 030305 genetics & heredity, PTEN Phosphohydrolase, Computational Biology, High-Throughput Nucleotide Sequencing, Methyltransferases, variant stability profiling, Mutation, biology.protein, Molecular mechanism, CAGI, Critical assessment, Experimental methods, phosphatase and tensin homolog, PTEN
Abstract: Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation (CAGI), we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 non-synonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerged as top-performers depending on the metric, it is non-trivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Published: 2019

15. Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants

Author: Ayodeji Olatubosun, Dago F Dimster-Denk, Zhiqiang Hu, Pier Luigi Martelli, Mauno Vihinen, Olivier Lichtarge, Frederic Rousseau, Iddo Friedberg, Castrense Savojardo, Sean D. Mooney, Emanuela Leonardi, Greet De Baets, Manuel Giollo, Jouni Väliaho, Yana Bromberg, Rachel Karchin, Chen Cao, Janita Thusberg, Changhua Yu, Susanna Repo, Rita Casadio, David L. Masica, Laura Kasak, Emidio Capriotti, Jasper Rine, Gaurav Pandey, Silvio C. E. Tosatto, John Moult, Lipika R. Pal, Steven E. Brenner, Predrag Radivojac, Panagiotis Katsonis, Joost Schymkowitz, Joost Van Durme, Constantina Bakolitsa, Kasak L., Bakolitsa C., Hu Z., Yu C., Rine J., Dimster-Denk D.F., Pandey G., De Baets G., Bromberg Y., Cao C., Capriotti E., Casadio R., Van Durme J., Giollo M., Karchin R., Katsonis P., Leonardi E., Lichtarge O., Martelli P.L., Masica D., Mooney S.D., Olatubosun A., Radivojac P., Rousseau F., Pal L.R., Savojardo C., Schymkowitz J., Thusberg J., Tosatto S.C.E., Vihinen M., Valiaho J., Repo S., Moult J., Brenner S.E., and Friedberg I.
Subjects: Homocysteine, IMPACT, ved/biology.organism_classification_rank.species, Transsulfuration pathway, chemistry.chemical_compound, 2.1 Biological and endogenous factors, Single amino acid, Aetiology, Precision Medicine, Genetics (clinical), Genetics & Heredity, PROTEIN FUNCTION, 0303 health sciences, biology, 030305 genetics & heredity, CAGI challenge, SNAP, Phenotype, machine learning, Networking and Information Technology R&D (NITRD), phenotype prediction, critical assessment, Life Sciences & Biomedicine, cystathionine-beta-synthase, ENZYME, Clinical Sciences, Cystathionine beta-Synthase, Homocystinuria, Computational biology, single amino acid substitution, CLASSIFICATION, Article, 03 medical and health sciences, Cystathionine, Genetics, medicine, Humans, Model organism, 030304 developmental biology, SERVER, TOOLS, Science & Technology, MUTATIONS, business.industry, ved/biology, Computational Biology, medicine.disease, Cystathionine beta synthase, Good Health and Well Being, chemistry, Amino Acid Substitution, biology.protein, Generic health relevance, Personalized medicine, business, PATHOGENICITY
Abstract: Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges. ispartof: HUMAN MUTATION vol:40 issue:9 pages:1530-1545 ispartof: location:United States status: published
Published: 2019

16. funtrp: identifying protein positions for variation driven functional tuning

Author: Daniel Vitale, Peter C. Kahn, Burkhard Rost, Yana Bromberg, and Maximilian Miller
Subjects: Models, Molecular, Synthetic protein, Computer science, Protein domain, Stability (learning theory), Computational biology, Variation (game tree), Biology, 03 medical and health sciences, Structure-Activity Relationship, 0302 clinical medicine, Position (vector), Genetics, Humans, Amino Acid Sequence, Databases, Protein, Gene, Conserved Sequence, 030304 developmental biology, 0303 health sciences, Sequence, Base Sequence, Computational Biology, Proteins, ddc, Range (mathematics), Mutation (genetic algorithm), Mutation, Methods Online, Function (biology), 030217 neurology & neurosurgery
Abstract: Evaluating the impact of non-synonymous genetic variants is essential for uncovering disease associations and mechanisms of evolution. Understanding corresponding sequence changes is also fundamental for synthetic protein design and stability assessments. However, the performance gain of variant effect predictors observed in recent years is not in line with the increased complexity of new methods. One likely reason for this might be that most approaches use similar sets of gene/protein features for modeling variant effect, often emphasizing sequence conservation. While high levels of conservation highlight residues essential for protein activity, much of the in vivo observable variation is arguably weaker in its impact and, thus, requires evaluation at a higher level of resolution. Here we describe function Neutral/Toggle/Rheostat predictor (funtrp), a novel computational method that categorizes protein positions based on the position-specific expected range of mutational impacts: Neutral (weak/no effects), Rheostat (function-tuning positions), or Toggle (on/off switches). We show that position types do not correlate strongly with familiar protein features such as conservation or protein disorder. We also find that position type distribution varies across different protein functions. Finally, we demonstrate that position types reflect experimentally determined functional effects and can thus improve performance of existing variant effect predictors and suggest a way forward for the development of new ones.
Published: 2018

17. Ten simple rules for drawing scientific comics

Author: Yana Bromberg, Matthew Partridge, and Jason E. McDermott
Subjects: 0301 basic medicine, Science and Technology Workforce, Facebook, Computer science, Social Sciences, Careers in Research, Infographics, Facial recognition system, Cognition, Learning and Memory, Sociology, Simple (abstract algebra), Psychology, Biology (General), Science in the Arts, media_common, Social communication, Ecology, Communication, Social Communication, Professions, Editorial, Social Networks, Computational Theory and Mathematics, Modeling and Simulation, Graphs, Network Analysis, Wit and Humor as Topic, Personality, Computer and Information Sciences, Science Policy, QH301-705.5, media_common.quotation_subject, Twitter, MEDLINE, Comics, Face Recognition, 03 medical and health sciences, Cellular and Molecular Neuroscience, Memory, Genetics, Humans, Social media, Molecular Biology, Ecology, Evolution, Behavior and Systematics, Information retrieval, business.industry, Data Visualization, Cognitive Psychology, Computational Biology, Biology and Life Sciences, Communications, 030104 developmental biology, People and Places, Scientists, Cognitive Science, Population Groupings, Perception, business, Social Media, Neuroscience
Published: 2018

18. Working towards precision medicine: predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges

Author: Predrag Radivojac, Yanran Wang, Kunal Kundu, Maggie Haitian Wang, Laksshman Sundaram, Pier Luigi Martelli, Sohela Shah, Steven E. Brenner, Emanuela Leonardi, Yuxiang Jiang, Roxana Daneshjou, Mehdi Pirooznia, Marco Carraro, Rita Casadio, Biao Li, Giulia Babbi, Peter P. Zandi, John Moult, Silvio C. E. Tosatto, Andre Franke, Yanay Ofran, James B. Potash, David T. Jones, Mauno Vihinen, Billy Chang, Sean D. Mooney, Pietro Di Lena, Roger A. Hoskins, Russ B. Altman, David K. Gifford, Rajendra Rana Bhat, Kymberleigh A. Pagel, Carlo Ferrari, Yana Bromberg, Susanna Repo, Britt-Sabina Petersen, Xiaolin Li, Yizhou Yin, Alexander A. Morgan, Teri E. Klein, Lipika R. Pal, Ron Unger, Samuele Bovo, Abhishek Niroula, Richard W. McCombie, Vikas Pejaver, Eran Bachar, Matthew D. Edwards, Alessandra Gasparini, Johnathan Roy Azaria, Manuel Giollo, Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory, Daneshjou, Roxana, Wang, Yanran, Bromberg, Yana, Bovo, Samuele, Martelli, Pier L, Babbi, Giulia, Pietro Di, Lena, Casadio, Rita, Edwards, Matthew, Gifford, David, Jones, David T, Sundaram, Laksshman, Bhat, Rajendra Rana, Xiaolin, Li, Pal, Lipika R., Kundu, Kunal, Yin, Yizhou, Moult, John, Jiang, Yuxiang, Pejaver, Vika, Pagel, Kymberleigh A., Biao, Li, Mooney, Sean D., Radivojac, Predrag, Shah, Sohela, Carraro, Marco, Gasparini, Alessandra, Leonardi, Emanuela, Giollo, Manuel, Ferrari, Carlo, Tosatto, Silvio C E, Bachar, Eran, Azaria, Johnathan R., Ofran, Yanay, Unger, Ron, Niroula, Abhishek, Vihinen, Mauno, Chang, Billy, Wang, Maggie H, Franke, Andre, Petersen, Britt-Sabina, Pirooznia, Mehdi, Zandi, Peter, Mccombie, Richard, Potash, James B., Altman, Russ B., Klein, Teri E., Hoskins, Roger A., Repo, Susanna, Brenner, Steven E., and Morgan, Alexander A.
Subjects: 0301 basic medicine, Bipolar Disorder, Pharmacogenomic Variants, Information Dissemination, Disease, Biology, Bioinformatics, Genome, Whole Exome Sequencing, Article, 03 medical and health sciences, 0302 clinical medicine, Genetic, Crohn Disease, bipolar disorder, Crohn's disease, exomes, machine learning, phenotype prediction, warfarin, Genetics, Genetics (clinical), Databases, Genetic, Exome Sequencing, Humans, Genetic Predisposition to Disease, Precision Medicine, Exome, Exome sequencing, Interpretation (philosophy), Computational Biology, Precision medicine, Data science, Phenotype, 030104 developmental biology, Pharmacogenomic Variant, Warfarin, exome, 030217 neurology & neurosurgery, Human
Abstract: Precision medicine aims to predict a patient's disease risk and best therapeutic options by using that individual's genetic sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotypeâphenotype prediction challenges; participants build models, undergo assessment, and share key findings. For CAGI 4, three challenges involved using exome-sequencing data: Crohn's disease, bipolar disorder, and warfarin dosing. Previous CAGI challenges included prior versions of the Crohn's disease challenge. Here, we discuss the range of techniques used for phenotype prediction as well as the methods used for assessing predictive models. Additionally, we outline some of the difficulties associated with making predictions and evaluating them. The lessons learned from the exome challenges can be applied to both research and clinical efforts to improve phenotype prediction from genotype. In addition, these challenges serve as a vehicle for sharing clinical and research exome data in a secure manner with scientists who have a broad range of expertise, contributing to a collaborative effort to advance our understanding of genotypeâphenotype relationships.
Published: 2017

19. Functional sequencing read annotation for high precision microbiome analysis

Author: Chengsheng, Zhu, Maximilian, Miller, Srinayani, Marpaka, Pavel, Vaysberg, Malte C, Rühlemann, Guojun, Wu, Femke-Anouska, Heinsen, Marie, Tempel, Liping, Zhao, Wolfgang, Lieb, Andre, Franke, and Yana, Bromberg
Subjects: Bacterial Proteins, Crohn Disease, Microbiota, Humans, Methods Online, Molecular Sequence Annotation, Metagenomics, Child, Prader-Willi Syndrome, Sequence Alignment, Algorithms
Abstract: The vast majority of microorganisms on Earth reside in often-inseparable environment-specific communities—microbiomes. Meta-genomic/-transcriptomic sequencing could reveal the otherwise inaccessible functionality of microbiomes. However, existing analytical approaches focus on attributing sequencing reads to known genes/genomes, often failing to make maximal use of available data. We created faser (functional annotation of sequencing reads), an algorithm that is optimized to map reads to molecular functions encoded by the read-correspondent genes. The mi-faser microbiome analysis pipeline, combining faser with our manually curated reference database of protein functions, accurately annotates microbiome molecular functionality. mi-faser’s minutes-per-microbiome processing speed is significantly faster than that of other methods, allowing for large scale comparisons. Microbiome function vectors can be compared between different conditions to highlight environment-specific and/or time-dependent changes in functionality. Here, we identified previously unseen oil degradation-specific functions in BP oil-spill data, as well as functional signatures of individual-specific gut microbiome responses to a dietary intervention in children with Prader–Willi syndrome. Our method also revealed variability in Crohn's Disease patient microbiomes and clearly distinguished them from those of related healthy individuals. Our analysis highlighted the microbiome role in CD pathogenicity, demonstrating enrichment of patient microbiomes in functions that promote inflammation and that help bacteria survive it.
Published: 2017

20. Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI

Author: Steven E. Brenner, Marco Carraro, Rita Casadio, Giovanni Minervini, Roland L. Dunbrack, Lisa Elefanti, Mauno Vihinen, Maria Chiara Scaini, Yizhou Yin, P. Fariselli, Chiara Menin, Yana Bromberg, Qiong Wei, Silvio C. E. Tosatto, Panagiotis Katsonis, Susanna Repo, John Moult, Yuedong Yang, Pier Luigi Martelli, Emidio Capriotti, Carlo Ferrari, Olivier Lichtarge, Qifang Xu, Lipika R. Pal, Emanuela Leonardi, Huiying Zhao, Jan Zaucha, Abhishek Niroula, Manuel Giollo, Yaoqi Zhou, Julian Gough, Carraro, Marco, Minervini, Giovanni, Giollo, Manuel, Bromberg, Yana, Capriotti, Emidio, Casadio, Rita, Dunbrack, Roland, Elefanti, Lisa, Fariselli, Pietro, Ferrari, Carlo, Gough, Julian, Katsonis, Panagioti, Leonardi, Emanuela, Lichtarge, Olivier, Menin, Chiara, Martelli, Pier Luigi, Niroula, Abhishek, Pal, Lipika R, Repo, Susanna, Scaini, Maria Chiara, Vihinen, Mauno, Wei, Qiong, Xu, Qifang, Yang, Yuedong, Yin, Yizhou, Zaucha, Jan, Zhao, Huiying, Zhou, Yaoqi, Brenner, Steven E, Moult, John, and Tosatto, Silvio C E
Subjects: 0301 basic medicine, medicine.medical_specialty, Bioinformatics tools, Pathogenicity predictors, In silico, Context (language use), Computational biology, Biology, Genome, Article, Machine Learning, 03 medical and health sciences, CDKN2A, Cell Line, Tumor, Databases, Genetic, Genetics, medicine, CAGI experiment, cancer, Cyclin-Dependent Kinase Inhibitor p18, Humans, Computer Simulation, Genetic Predisposition to Disease, Genetics (clinical), Reliability (statistics), Variant interpretation, Cyclin-Dependent Kinase Inhibitor p16, Cancer, Cell Proliferation, Bioinformatics and Systems Biology, Protein Stability, pathogenicity predictor, variant interpretation, Computational Biology, Genetic Variation, bioinformatics tools, pathogenicity predictors, Variety (cybernetics), 030104 developmental biology, Ranking, bioinformatics tool, Medical genetics, Medical Genetics
Abstract: Correct phenotypic interpretation of variants of unknown significance for cancer-associated genes is a diagnostic challenge as genetic screenings gain in popularity in the next-generation sequencing era. The Critical Assessment of Genome Interpretation (CAGI) experiment aims to test and define the state of the art of genotype-phenotype interpretation. Here, we present the assessment of the CAGI p16INK4a challenge. Participants were asked to predict the effect on cellular proliferation of ten variants for the p16INK4a tumor suppressor, a cyclin-dependent kinase inhibitor encoded by the CDKN2A gene. Twenty-two pathogenicity predictors were assessed with a variety of accuracy measures for reliability in a medical context. Different assessment measures were combined in an overall ranking to provide more robust results. The R scripts used for assessment are publicly available from a GitHub repository for future use in similar assessment exercises. Despite a limited test-set size, our findings show a variety of results, with some methods performing significantly better. Methods combining different strategies frequently outperform simpler approaches. The best predictor, Yang&Zhou lab, uses a machine learning method combining an empirical energy function measuring protein stability with an evolutionary conservation term. The p16INK4a challenge highlights how subtle structural effects can neutralize otherwise deleterious variants. This article is protected by copyright. All rights reserved.
Published: 2017

21. De Novo Coding Variants Are Strongly Associated with Tourette Disorder

Author: A. Jeremy Willsey, Thomas V. Fernandez, Dongmei Yu, Robert A. King, Andrea Dietrich, Jinchuan Xing, Stephan J. Sanders, Jeffrey D. Mandell, Alden Y. Huang, Petra Richer, Louw Smith, Shan Dong, Kaitlin E. Samocha, Benjamin M. Neale, Giovanni Coppola, Carol A. Mathews, Jay A. Tischfield, Jeremiah M. Scharf, Matthew W. State, Gary A. Heiman, Mohamed Abdulkadir, Julia Bohnenpoll, Yana Bromberg, Lawrence W. Brown, Keun-Ah Cheon, Barbara J. Coffey, Li Deng, Lonneke Elzerman, Odette Fründt, Blanca Garcia-Delgar, Erika Gedvilaite, Donald L. Gilbert, Dorothy E. Grice, Julie Hagstrøm, Tammy Hedderly, Isobel Heyman, Pieter J. Hoekstra, Hyun Ju Hong, Chaim Huyser, Laura Ibanez-Gomez, Young Key Kim, Young-Shin Kim, Yun-Joo Koh, Sodahm Kook, Samuel Kuperman, Andreas Lamerz, Bennett Leventhal, Andrea G. Ludolph, Claudia Lühr da Silva, Marcos Madruga-Garrido, Athanasios Maras, Pablo Mir, Astrid Morer, Alexander Münchau, Tara L. Murphy, Cara Nasello, Thaïra J.C. Openneer, Kerstin J. Plessen, Veit Roessner, Stephan Sanders, Eun-Young Shin, Deborah A. Sival, Dong-Ho Song, Jungeun Song, Anne Marie Stolte, Nawei Sun, Jennifer Tübing, Frank Visscher, Michael F. Walker, Sina Wanderer, Shuoguo Wang, Martin Woods, Yeting Zhang, Anbo Zhou, Samuel H. Zinner, Cathy L. Barr, James R. Batterson, Cheston Berlin, Ruth D. Bruun, Cathy L. Budman, Danielle C. Cath, Sylvain Chouinard, Nancy J. Cox, Sabrina Darrow, Lea K. Davis, Yves Dion, Nelson B. Freimer, Marco A. Grados, Matthew E. Hirschtritt, Cornelia Illmann, Roger Kurlan, James F. Leckman, Gholson J. Lyon, Irene A. Malaty, William M. MaMahon, Michael S. Okun, Lisa Osiecki, David L. Pauls, Danielle Posthuma, Vasily Ramensky, Mary M. Robertson, Guy A. Rouleau, Paul Sandor, Harvey S. Singer, Jan Smit, Jae-Hoon Sul, Psychiatry, Amsterdam Neuroscience - Cellular & Molecular Mechanisms, Amsterdam Neuroscience - Complex Trait Genetics, Amsterdam Neuroscience - Compulsivity, Impulsivity & Attention, Amsterdam Neuroscience - Mood, Anxiety, Psychosis, Stress & Sleep, Amsterdam Reproduction & Development (AR&D), Human genetics, Other departments, and Child Psychiatry
Subjects: 0301 basic medicine, Proband, Adult, Male, Parents, INTELLECTUAL DISABILITY, FUNCTIONAL GENOMICS EXPERIMENTS, DNA-SEQUENCING DATA, AUTISM SPECTRUM DISORDERS, Cell Cycle Proteins, Receptors, Cell Surface, Biology, medicine.disease_cause, Bioinformatics, Tourette syndrome, Article, TIC DISORDERS, Cohort Studies, 03 medical and health sciences, medicine, Odds Ratio, Missense mutation, Humans, Genetic Predisposition to Disease, Copy-number variation, Child, Gene, Exome sequencing, Genetics, COPY NUMBER VARIANTS, Mutation, MUTATIONS, General Neuroscience, PSYCHIATRIC-DISORDERS, Intracellular Signaling Peptides and Proteins, Genetic Variation, Proteins, NIPBL, OBSESSIVE-COMPULSIVE DISORDER, LANGE-SYNDROME, Sequence Analysis, DNA, medicine.disease, Cadherins, Phosphoproteins, Fibronectins, 030104 developmental biology, Female, Tourette Syndrome
Abstract: Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. Video Abstract [Figure presented]
Published: 2017

22. News from the Protein Mutability Landscape

Author: Yana Bromberg, Burkhard Rost, and Maximilian Hecht
Subjects: Models, Molecular, In silico, Mutation, Missense, Protein Data Bank (RCSB PDB), Single-nucleotide polymorphism, Computational biology, Biology, complete single mutagenesis, Deep sequencing, Structural Biology, Humans, Point Mutation, Genetic Predisposition to Disease, Molecular Biology, Genetic Association Studies, Genetics, Protein Stability, Point mutation, Computational Biology, Proteins, Robustness (evolution), exome-wide mutagenesis, computer.file_format, Alanine scanning, Protein Data Bank, alanine scanning, in silico mutagenesis, SNP effects, computer
Abstract: Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype–phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants.
Published: 2013

23. Building a Genome Analysis Pipeline to Predict Disease Risk and Prevent Disease

Author: Yana Bromberg
Subjects: Risk, Whole genome sequencing, Genetics, Genome, Human, Context (language use), Sequence Analysis, DNA, Computational biology, Phenome, Biology, ENCODE, Genome, Variome, Structural Biology, Mutation, Humans, Genetic Predisposition to Disease, Copy-number variation, Molecular Biology, Exome sequencing
Abstract: Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
Published: 2013

24. Association Between Variants of PRDM1 and NDP52 and Crohn's Disease, Based on Exome Sequencing and Functional Studies

Author: Severine Vermeire, Morten H. Vatn, Simone Lipinski, Xiao Liu, Sebastian Zeissig, Manuel A. Rivas, Hu Zhang, Mauro D'Amato, Manfred Kayser, Bernhard O. Boehm, Jeremy D. Sanderson, Cisca Wijmenga, Anna Latiano, Nadezhda Tsankova Doncheva, Ulla Vogel, Rinse K. Weersma, Jurgita Skieceviciene, Markus M. Nöthen, Stefan Schreiber, Tom H. Karlsen, Vito Annese, Carlo Berzuini, Fuman Jiang, James Lee, Mario Albrecht, Tao Jiang, Susanna Nikolaus, Thomas Illig, Qing Liu, Stephan Brand, Carsten Büning, David P. Strachan, Andreas Keller, Jun Wang, Michael Nothnagel, Andreas Till, Juliane Winkelmann, Miquel Sans, Jane C. Goodall, Philip Rosenstiel, David Ellinghaus, Andre Franke, Michael Krawczak, Richard H. Duerr, Cyriel Y. Ponsioen, Miles Parkes, Jonas Halfvarson, Björn Stade, Michael Forster, Vibeke Andersen, Suresh Subramani, Eva Ellinghaus, Limas Kupčinskas, Gabriele Mayr, Jürgen Glas, Paul Rutgeerts, Christopher G. Mathew, Robert Häsler, Wendy L. McArdle, Yana Bromberg, Mark J. Daly, AGEM - Amsterdam Gastroenterology Endocrinology Metabolism, Gastroenterology and Hepatology, Groningen Institute for Gastro Intestinal Genetics and Immunology (3GI), and Genetic Identification
Subjects: Male, EXPRESSION, AUTOPHAGY RECEPTOR, SUSCEPTIBILITY LOCI, Single-nucleotide polymorphism, Genome-wide association study, Biology, Article, Crohn Disease, Inflammatory Bowel Disease Whole-Exome Sequencing Complex Disease, PRDM1, T-CELL HOMEOSTASIS, Missense mutation, Humans, Whole-Exome Sequencing, GENOME-WIDE ASSOCIATION, Exome, Exome sequencing, Genetic association, Genetics, Hepatology, TRANSCRIPTIONAL REPRESSOR BLIMP-1, Inflammatory Bowel Disease, Gastroenterology, Nuclear Proteins, LYMPHOCYTE MIGRATION, Repressor Proteins, Complex Disease, ULCERATIVE-COLITIS, Expression quantitative trait loci, L-SELECTIN, Colitis, Ulcerative, Female, INFLAMMATORY-BOWEL-DISEASE
Abstract: Background & Aims Genome-wide association studies (GWAS) have identified 140 Crohn's disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies. Methods We sequenced whole exomes of 42 unrelated subjects with CD and 5 healthy subjects (controls) and then filtered single nucleotide variants by incorporating association results from meta-analyses of CD GWAS and in silico mutation effect prediction algorithms. We then genotyped 9348 subjects with CD, 2868 subjects with ulcerative colitis, and 14,567 control subjects and associated variants analyzed in functional studies using materials from subjects and controls and in vitro model systems. Results We identified rare missense mutations in PR domain-containing 1 ( PRDM1 ) and associated these with CD. These mutations increased proliferation of T cells and secretion of cytokines on activation and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWAS, correlated with reduced expression of PRDM1 in ileal biopsy specimens and peripheral blood mononuclear cells (combined P = 1.6 × 10 −8 ). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 ( NDP52 ) ( P = 4.83 × 10 −9 ). We found that this variant impairs the regulatory functions of NDP52 to inhibit nuclear factor κB activation of genes that regulate inflammation and affect the stability of proteins in Toll-like receptor pathways. Conclusions We have extended the results of GWAS and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWAS and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and signaling molecules, supporting the role of autophagy in the pathogenesis of CD.
Published: 2013

25. Protein function in precision medicine: deep understanding with machine learning

Author: Predrag Radivojac, Yana Bromberg, and Burkhard Rost
Subjects: 0301 basic medicine, Biophysics, Personalized health, Variation (game tree), Biology, Machine learning, computer.software_genre, Biochemistry, Article, Machine Learning, 03 medical and health sciences, Structural Biology, Genetics, Humans, Protein Interaction Maps, Precision Medicine, Molecular Biology, Protein function, Focus (computing), business.industry, Computational Biology, Proteins, Cell Biology, Precision medicine, 030104 developmental biology, Protein Interaction Networks, Artificial intelligence, business, computer
Abstract: Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Published: 2016

26. Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility

Author: Hong-Seok Ha, Clifford Qualls, Julia Klensney-Tait, Michael P. Moreau, Maria A. Picchi, Jenny T. Mao, Steven A. Belinsky, Joseph Zabner, Jun Ho Jang, Yana Bromberg, Jinchuan Xing, Yong Lin, Toru Nyunoya, Raymond J. Langley, Shuguang Leng, Shannon Bruse, and Nan Wang
Subjects: Male, 0301 basic medicine, Candidate gene, Cell Survival, Primary Cell Culture, Biology, Bioinformatics, Pulmonary Disease, Chronic Obstructive, 03 medical and health sciences, Drug Discovery, Genetics, Genetic predisposition, Humans, COPD, Gene silencing, Exome, Genetic Predisposition to Disease, Molecular Biology, Gene, Genetic Association Studies, Exome sequencing, Aged, Genetic association, Resistant smokers, Smoking, Whole exome sequencing, High-Throughput Nucleotide Sequencing, Correction, Cigarette smoke, Middle Aged, Human genetics, respiratory tract diseases, Susceptible smokers, 3. Good health, Phenotype, 030104 developmental biology, Molecular Medicine, Female, Primary Research
Abstract: Background Chronic obstructive pulmonary disease (COPD) is characterized by an irreversible airflow limitation in response to inhalation of noxious stimuli, such as cigarette smoke. However, only 15–20 % smokers manifest COPD, suggesting a role for genetic predisposition. Although genome-wide association studies have identified common genetic variants that are associated with susceptibility to COPD, effect sizes of the identified variants are modest, as is the total heritability accounted for by these variants. In this study, an extreme phenotype exome sequencing study was combined with in vitro modeling to identify COPD candidate genes. Results We performed whole exome sequencing of 62 highly susceptible smokers and 30 exceptionally resistant smokers to identify rare variants that may contribute to disease risk or resistance to COPD. This was a cross-sectional case-control study without therapeutic intervention or longitudinal follow-up information. We identified candidate genes based on rare variant analyses and evaluated exonic variants to pinpoint individual genes whose function was computationally established to be significantly different between susceptible and resistant smokers. Top scoring candidate genes from these analyses were further filtered by requiring that each gene be expressed in human bronchial epithelial cells (HBECs). A total of 81 candidate genes were thus selected for in vitro functional testing in cigarette smoke extract (CSE)-exposed HBECs. Using small interfering RNA (siRNA)-mediated gene silencing experiments, we showed that silencing of several candidate genes augmented CSE-induced cytotoxicity in vitro. Conclusions Our integrative analysis through both genetic and functional approaches identified two candidate genes (TACC2 and MYO1E) that augment cigarette smoke (CS)-induced cytotoxicity and, potentially, COPD susceptibility.
Published: 2016

27. fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks

Author: Maximilian Miller, Chengsheng Zhu, Yana Bromberg, and Yannick Mahlich
Subjects: 0301 basic medicine, Databases, Factual, Gene Transfer, Horizontal, Niche, Computational biology, Biology, Bacterial Physiological Phenomena, Genome, 03 medical and health sciences, User-Computer Interface, Bacterial Proteins, Similarity (psychology), Databases, Genetic, Genetics, Environmental Microbiology, Database Issue, Humans, Organism, Phylogeny, 030304 developmental biology, Synechococcus, 0303 health sciences, Internet, Metadata, Bacteria, 030306 microbiology, Ecology, Microbiota, Biodiversity, 030104 developmental biology, Taxon, Horizontal gene transfer, Proteome, Metagenomics, Corrigendum
Abstract: Microbial functional diversification is driven by environmental factors,i.e.microorganisms inhabiting the same environmental niche tend to be more functionally similar than those from different environments. In some cases, even closely phylogenetically related microbes differ more across environments than across taxa. While microbial similarities are often reported in terms of taxonomic relationships, no existing databases directly links microbial functions to the environment. We previously developed a method for comparing microbial functional similarities on the basis of proteins translated from the sequenced genomes. Here we describefusionDB, a novel database that uses our functional data to represent 1,374 taxonomically distinct bacteria annotated with available metadata: habitat/niche, preferred temperature, and oxygen use. Each microbe is encoded as a set of functions represented by its proteome and individual microbes are connected via common functions. Users can searchfusionDB via combinations of organism names and metadata. Moreover, the web interface allows mapping new microbial genomes to the functional spectrum of reference bacteria, rendering interactive similarity networks that highlight shared functionality.fusionDB provides a fast means of comparing microbes, identifying potential horizontal gene transfer events, and highlighting key environment-specific functionality.fusionDB is publicly available athttp://services.bromberglab.org/fusiondb/.
Published: 2016
Full Text: View/download PDF

28. In silico mutagenesis: a case study of the melanocortin 4 receptor

Author: Rudolph L. Leibel, Yana Bromberg, Christian Vaisse, Burkhard Rost, and John D. Overton
Subjects: obesity, Mutant, MC4R, Biology, Biochemistry, Research Communications, Receptors, G-Protein-Coupled, Mice, 03 medical and health sciences, 0302 clinical medicine, Melanocortin receptor, MC1R, Genetics, Animals, Humans, Point Mutation, Coding region, Computer Simulation, Amino Acid Sequence, Receptor, Molecular Biology, 030304 developmental biology, Sequence (medicine), G protein-coupled receptor, 0303 health sciences, diabetes, active functional site, SNAP, Melanocortin 3 receptor, Melanocortin 4 receptor, Mutagenesis, Receptor, Melanocortin, Type 4, 030217 neurology & neurosurgery, Biotechnology
Abstract: The melanocortin 4 receptor (MC4R) is a G-protein-coupled receptor (GPCR) and a key molecule in the regulation of energy homeostasis. At least 159 substitutions in the coding region of human MC4R (hMC4R) have been described experimentally; over 80 of those occur naturally, and many have been implicated in obesity. However, assessment of the presumably functionally essential residues remains incomplete. Here we have performed a complete in silico mutagenesis analysis to assess the functional essentiality of all possible nonnative point mutants in the entire hMC4R protein (332 residues). We applied SNAP, which is a method for quantifying functional consequences of single amino acid (AA) substitutions, to calculate the effects of all possible substitutions at each position in the hMC4R AA sequence. We compiled a mutability score that reflects the degree to which a particular residue is likely to be functionally important. We performed the same experiment for a paralogue human melanocortin receptor (hMC1R) and a mouse orthologue (mMC4R) in order to compare computational evaluations of highly related sequences. Three results are most salient: 1) our predictions largely agree with the available experimental annotations; 2) this analysis identified several AAs that are likely to be functionally critical, but have not yet been studied experimentally; and 3) the differential analysis of the receptors implicates a number of residues as specifically important to MC4Rs vs. other GPCRs, such as hMC1R.—Bromberg, Y., Overton, J., Vaisse, C., Leibel, R. L., Rost, B. In silico mutagenesis: a case study of the melanocortin 4 receptor.
Published: 2009

29. SNAP: predict effect of non-synonymous polymorphisms on function

Author: Yana Bromberg and Burkhard Rost
Subjects: Protein Conformation, Sequence analysis, Single-nucleotide polymorphism, Biology, medicine.disease_cause, Polymorphism, Single Nucleotide, 03 medical and health sciences, 0302 clinical medicine, Sequence Analysis, Protein, Polymorphism (computer science), Genetic variation, Genetics, medicine, Humans, SNP, 030304 developmental biology, 0303 health sciences, Mutation, Genetic Diseases, Inborn, Snap, Computational Biology, Proteins, Reproducibility of Results, Phenotype, Amino Acid Substitution, Neural Networks, Computer, 030217 neurology & neurosurgery
Abstract: Many genetic variations are single nucleotide polymorphisms (SNPs). Non-synonymous SNPs are 'neutral' if the resulting point-mutated protein is not functionally discernible from the wild type and 'non-neutral' otherwise. The ability to identify non-neutral substitutions could significantly aid targeting disease causing detrimental mutations, as well as SNPs that increase the fitness of particular phenotypes. Here, we introduced comprehensive data sets to assess the performance of methods that predict SNP effects. Along we introduced SNAP (screening for non-acceptable polymorphisms), a neural network-based method for the prediction of the functional effects of non-synonymous SNPs. SNAP needs only sequence information as input, but benefits from functional and structural annotations, if available. In a cross-validation test on over 80,000 mutants, SNAP identified 80% of the non-neutral substitutions at 77% accuracy and 76% of the neutral substitutions at 80% accuracy. This constituted an important improvement over other methods; the improvement rose to over ten percentage points for mutants for which existing methods disagreed. Possibly even more importantly SNAP introduced a well-calibrated measure for the reliability of each prediction. This measure will allow users to focus on the most accurate predictions and/or the most severe effects. Available at http://www.rostlab.org/services/SNAP.
Published: 2007

30. Better prediction of functional effects for sequence variants

Author: Maximilian, Hecht, Yana, Bromberg, and Burkhard, Rost
Subjects: Evolution, Molecular, neural network, SNP effect, Research, from sequence, functional effect prediction, Computational Biology, Genetic Variation, Humans, Protein Isoforms, variant effect, Neural Networks, Computer, Software
Abstract: Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Definitions used Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant.
Published: 2015

31. The young PI Buzz

Author: Theodore Alexandrov, Venkata P. Satagopam, Manuel Corpas, Jeroen de Ridder, Geoff Macintyre, Yana Bromberg, Magali Michaut, and Luxembourg Centre for Systems Biomedicine (LCSB): Bioinformatics Core (R. Schneider Group) [research center]
Subjects: Operations research, QH301-705.5, Computer science, academic journal papers, Passions, Library science, Biochemistry, biophysics & molecular biology [F05] [Life sciences], Key issues, GeneralLiterature_MISCELLANEOUS, Social Networking, 03 medical and health sciences, Cellular and Molecular Neuroscience, 0302 clinical medicine, Genetics, Humans, Social media, Cooperative Behavior, Biology (General), Biochimie, biophysique & biologie moléculaire [F05] [Sciences du vivant], GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries), Molecular Biology, Ecology, Evolution, Behavior and Systematics, ComputingMethodologies_COMPUTERGRAPHICS, 030304 developmental biology, Accreditation, 0303 health sciences, Marketing buzz, Ecology, Mentors, Principal (computer security), Computational Biology, Research Personnel, Research career, Computational Theory and Mathematics, Modeling and Simulation, Perspective, Tracking (education), CWTS 0.75 <= JFIS < 2.00, 030217 neurology & neurosurgery
Abstract: If you are a young principal investigator (PI) in the field of computational biology or bioinformatics, you may have noticed recently there is a buzz surrounding you: a plethora of meetings and seminars are being organized specifically for young PIs (P2P workshop at ISMB 2012, An Excellent Research Career Workshop 2012, EMBO Young Scientists' Forum, Young PI Forum at Weizmann Institute 2009–2013, Young Investigators' Meeting by NCI). The challenges faced by young PIs are being discussed widely [1], particularly across social media [2]; funding agencies are searching for new ways to encourage young investigators; new awards are being created; and novel journals, such as EuPA Open Proteomics, provide opportunities tailored for junior scientists. Picking up on this buzz and recognizing the need for a discussion platform, PLOS has established the About My Lab collection of publications. This article is a part of this collection, highlighting the latest event organized by, and for, young PIs: the Junior PI (JPI) meeting. The JPI meeting took place in Berlin, Germany, at this year's ISMB-ECCB 2013, the flagship conference of the International Society for Computational Biology (ISCB). With the support of the ISCB Board of Directors, the meeting was conceived and organized by a group of ISCB's young PIs, most of whom are former ISCB Student Council leaders. The meeting was a mixture of scientific talks, round-table discussions, and peer-to-peer interaction. To facilitate discussion and interaction, all participants introduced themselves during the joint breakfast. This was followed by three Frontiers in Science talks, in which researchers who recently started their own group gave a review-like overview of their research field and the challenges ahead. The keynote, by Jean Peccoud, dealt with how to run a research lab as a business [3], and how to use tracking tools to account for the productivity of lab members, which invoked plenty of discussion. In the afternoon, several round-table discussions ensued, with summaries presented to the entire audience at the end of the meeting. Since the prospective participants were asked in advance for topics of importance, these discussions were precisely tailored to reflect the interests of the audience. The meeting turned out to strike the right balance between scientific talks, experience exchange, getting to know each other, and networking opportunities. The success of the JPI meeting, while critically dependent on the input of the participants, may also be accredited to its organizers, each of whom brought his/her own experience, questions, and passions. Interestingly, some of the organizers are still in the postdoc-PI transition phase, which may explain why they are highly motivated to improve the life of a young PI. Moreover, it is becoming increasingly common in modern science for many postdocs to be involved in supervision of research staff, blurring the conventional distinction between a postdoc and PI. This rise of the postdoc as principal investigator was reflected in a recent report by the European University Institute [4]. This article is different from other About My Lab articles, each following the approach “one author—one interview.” Inspired by the experimental approach of the JPI meeting itself, we present you with six short interviews with the JPI meeting organizers, carried out by the Guest Editor of About My Lab (TA). By providing different opinions, these interviews shed light on some of the key issues of a young PI's career.
Published: 2013

32. Neutral and weakly nonneutral sequence variants may define individuality

Author: Burkhard Rost, Peter C. Kahn, and Yana Bromberg
Subjects: Genetics, Multidisciplinary, Sequence analysis, Genome, Human, Individuality, Genetic Variation, Biology, Biological Sciences, Phenotype, Genome, Genetic variation, Mutation (genetic algorithm), Mutation, Humans, Human genome, Sequence Analysis, Function (biology), Sequence (medicine)
Abstract: Large-scale computational analyses of the growing wealth of genome-variation data consistently tell two distinct stories. The first is expected: coding variants reported in disease-related databases significantly alter the function of affected proteins. The second is surprising: the genomes of healthy individuals appear to carry many variants that are predicted to have some effect on function. As long as the complete experimental analysis of all human genome variants remains impossible, computational methods, such as PolyPhen, SNAP, and SIFT, might provide important insights. These methods capture the effects of particular variants very well and can highlight trends in populations of variants. Diseases are, arguably, extreme phenotypic variations and are often attributable to one or a few severely functionally disruptive variants. Our findings suggest a genomic basis of the different nondisease phenotypes. Prediction methods indicate that variants in seemingly healthy individuals tend to be neutral or weakly disruptive for protein molecular function. These variant effects are predicted to be largely either experimentally undetectable or are not deemed significant enough to be published. This may suggest that nondisease phenotypes arise through combinations of many variants whose effects are weakly nonneutral (damaging or enhancing) to the molecular protein function but fall within the wild-type range of overall physiological function.
Published: 2013

33. Chapter 15: disease gene prioritization

Author: Yana Bromberg
Subjects: Candidate gene, QH301-705.5, Single-nucleotide polymorphism, Disease, Computational biology, Biology, Bioinformatics, Data type, Education, 03 medical and health sciences, Cellular and Molecular Neuroscience, 0302 clinical medicine, Protein Interaction Mapping, Genetics, Animals, Humans, Computer Simulation, Genetic Predisposition to Disease, Biology (General), Molecular Biology, Gene, Ecology, Evolution, Behavior and Systematics, 030304 developmental biology, Disease gene, 0303 health sciences, Ecology, Computational Biology, Computational gene, Exons, Models, Theoretical, Phenotype, Computational Theory and Mathematics, Modeling and Simulation, Mutation, Mutation (genetic algorithm), Software, 030217 neurology & neurosurgery
Abstract: Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
Published: 2013

34. Collective judgment predicts disease-associated single nucleotide variants

Author: Russ B. Altman, Emidio Capriotti, Yana Bromberg, Capriotti, Emidio, Altman, Russ B, and Bromberg, Yana
Subjects: dbSNP, Correlation coefficient, Mutation, Missense, Genomics, Single-nucleotide polymorphism, Disease, Computational biology, Biology, Polymorphism, Single Nucleotide, Correlation, Set (abstract data type), 03 medical and health sciences, 0302 clinical medicine, Predictive Value of Tests, Databases, Genetic, Genetics, Humans, Selection (genetic algorithm), 030304 developmental biology, 0303 health sciences, Research, Genetic Diseases, Inborn, ROC Curve, Area Under Curve, 030220 oncology & carcinogenesis, Algorithms, Biotechnology
Abstract: In recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy. Here we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthew's correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor. Here we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp .
Published: 2013

35. Thoughts from SNP-SIG 2012: future challenges in the annotation of genetic variations

Author: Emidio Capriotti, Yana Bromberg, Bromberg, Yana, and Capriotti, Emidio
Subjects: Genetics, dbSNP, Research, Genetic Variation, Molecular Sequence Annotation, Genomics, Genome-wide association study, Context (language use), Computational biology, Biology, Polymorphism, Single Nucleotide, Annotation, Phenotype, Humans, SNP, Personal genomics, Biotechnology
Abstract: Overview Advances in high-throughput sequencing, genotyping, and characterization of haplotype diversity are consistently generating vast amounts of genomic data. Single Nucleotide Polymorphisms (SNPs) are the most common type of genetic variation [1]. In the recent years the number of known SNPs has been increasing exponentially [2]; the last release of the NCBI’s dbSNP database [3] contained more than 55 million human SNPs. SNPs are interesting as both markers of evolutionary history and in the context of their phenotypic manifestations (e.g. characteristic traits and diseases). However, due to their sheer number, detailed experimental annotations are impossible and computational inference is severely limited by the required resources. SNPs also present a challenge for visualization and storage. In line with the increasing interest in the genetic variation analysis and annotation, on July 14th, 2012 [4] we organized the second SNP Special Interesting Group (SNP-SIG) meeting at the ISMB’12 in Long Beach, CA. This meeting attempted to summarize the field’s research advances in the directions of “Annotation and prediction of structural/functional impacts of coding SNPs” and “SNPs and Personal Genomics: GWAS, populations and phylogenetic analysis”. The discrepancy between the significant availability of the SNP data and the current lack of its interpretation requires the development of new computational annotation methods. The analysis of genetic variation is a key factor for the understanding of the genomic information. The SNP-SIG provides a forum necessary for the organization of a research network facilitating the exchange of ideas and for the establishment of new
Published: 2013

36. Bioinformatics for personal genome interpretation

Author: Nathan L. Nehrt, Emidio Capriotti, Maricel G. Kann, Yana Bromberg, Capriotti, Emidio, Nehrt, Nathan L., Kann, Maricel G., and Bromberg, Yana
Subjects: Genotype, Deleterious variant, Genomic variant database, Biology, Bioinformatics, Genome, Databases, Genetic, Human Genome Project, Genetic variation, Humans, Genetic variability, Gene, Molecular Biology, Genomic variation, Special Issue Papers, Genome interpretation, Computational Biology, Genetic Variation, Phenotypic trait, Phenotype, Variome, Gene prioritization, Human genome, Personal genomics, Information Systems
Abstract: An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field-the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome. © The Author 2012. Published by Oxford University Press.
Published: 2012

37. Functional analyses of variants reveal a significant role for dominant negative and common alleles in oligogenic Bardet–Biedl syndrome

Author: Norann A. Zaghloul, Edwin C. Oh, Jantje M. Gerdes, Cecilia Gascue, Jose L. Badano, Rudolph L. Leibel, Jonathan Binkley, Yangjian Liu, Carmen C. Leitch, Nicholas Katsanis, Arend Sidow, Yana Bromberg, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Institut Pasteur de Montevideo, Réseau International des Instituts Pasteur (RIIP), Center for Human Disease Modeling, Duke University [Durham], Johns Hopkins University (JHU), Department of Biochemistry and Molecular Biophysics, Columbia University [New York], Columbia University Center for Computational Biology and Bioinformatics, Department of Genetics [Stanford], Stanford Medicine, Stanford University-Stanford University, Department of Pathology [Stanford], and Division of Molecular Genetics and Naomi Berrie Diabetes Center
Subjects: Male, MESH: Mutation, MESH: Pedigree, Biology, MESH: Phenotype, medicine.disease_cause, 03 medical and health sciences, 0302 clinical medicine, MESH: Bardet-Biedl Syndrome, Bardet–Biedl syndrome, Genetic variation, medicine, Animals, Humans, MESH: Animals, Allele, MESH: Zebrafish, Bardet-Biedl Syndrome, Alleles, Zebrafish, 030304 developmental biology, Genetics, [SDV.GEN]Life Sciences [q-bio]/Genetics, 0303 health sciences, Mutation, MESH: Humans, Multidisciplinary, MESH: Alleles, Biological Sciences, medicine.disease, MESH: Gene Expression Regulation, MESH: Male, Human genetics, Pedigree, Genetic load, Ciliopathy, Phenotype, MESH: Models, Animal, Gene Expression Regulation, Models, Animal, Epistasis, Female, MESH: Female, 030217 neurology & neurosurgery
Abstract: Technological advances hold the promise of rapidly catalyzing the discovery of pathogenic variants for genetic disease. However, this possibility is tempered by limitations in interpreting the functional consequences of genetic variation at candidate loci. Here, we present a systematic approach, grounded on physiologically relevant assays, to evaluate the mutational content (125 alleles) of the 14 genes associated with Bardet–Biedl syndrome (BBS). A combination of in vivo assays with subsequent in vitro validation suggests that a significant fraction of BBS-associated mutations have a dominant-negative mode of action. Moreover, we find that a subset of common alleles, previously considered to be benign, are, in fact, detrimental to protein function and can interact with strong rare alleles to modulate disease presentation. These data represent a comprehensive evaluation of genetic load in a multilocus disease. Importantly, superimposition of these results to human genetics data suggests a previously underappreciated complexity in disease architecture that might be shared among diverse clinical phenotypes.
Published: 2010

38. Conserved amino acids within the adenovirus 2 E3/19K protein differentially affect downregulation of MHC class I and MICA/B proteins

Author: Ed May, Thomas Spies, Katja Koebernick, Douglas P. Owen, Lawrence P. Andrews, Beatrice Menz, Veronika Groh, Martina Sester, Hans-Gerhard Burgert, Yana Bromberg, Emily L. Stock, Alexander Steinle, and Minghui Ao
Subjects: Adenoviridae Infections, Immunology, Down-Regulation, Major histocompatibility complex, Transfection, Polymerase Chain Reaction, Protein Structure, Secondary, Conserved sequence, Protein structure, Downregulation and upregulation, MHC class I, Adenovirus E3 Proteins, Immunology and Allergy, Humans, Immunoprecipitation, Amino Acid Sequence, Peptide sequence, Conserved Sequence, Alanine, chemistry.chemical_classification, biology, Histocompatibility Antigens Class I, Flow Cytometry, Amino acid, Cell biology, Biochemistry, chemistry, biology.protein
Abstract: Successful establishment and persistence of adenovirus (Ad) infections are facilitated by immunosubversive functions encoded in the early transcription unit 3 (E3). The E3/19K protein has a dual role, preventing cell surface transport of MHC class I/HLA class I (MHC-I/HLA-I) Ags and the MHC-I–like molecules (MHC-I chain-related chain A and B [MICA/B]), thereby inhibiting both recognition by CD8 T cells and NK cells. Although some crucial functional elements in E3/19K have been identified, a systematic analysis of the functional importance of individual amino acids is missing. We now have substituted alanine for each of 21 aas in the luminal domain of Ad2 E3/19K conserved among Ads and investigated the effects on HLA-I downregulation by coimmunoprecipitation, pulse-chase analysis, and/or flow cytometry. Potential structural alterations were monitored using conformation-dependent E3/19K-specific mAbs. The results revealed that only a small number of mutations abrogated HLA-I complex formation (e.g., substitutions W52, M87, and W96). Mutants M87 and W96 were particularly interesting as they exhibited only minimal structural changes suggesting that these amino acids make direct contacts with HLA-I. The considerable number of substitutions with little functional defects implied that E3/19K may have additional cellular target molecules. Indeed, when assessing MICA/B cell-surface expression we found that mutation of T14 and M82 selectively compromised MICA/B downregulation with essentially no effect on HLA-I modulation. In general, downregulation of HLA-I was more severely affected than that of MICA/B; for example, substitutions W52, M87, and W96 essentially abrogated HLA-I modulation while largely retaining the ability to sequester MICA/B. Thus, distinct conserved amino acids seem preferentially important for a particular functional activity of E3/19K.
Published: 2009

39. Association of functionally significant Melanocortin-4 but not Melanocortin-3 receptor mutations with severe adult obesity in a large North American case–control study

Author: John P. Kane, Clive R. Pullinger, Melissa A. Calton, Baran A. Ersoy, Robert Dent, Yana Bromberg, Christian Vaisse, Ruth McPherson, Sumei Zhang, Mary J. Malloy, Len A. Pennacchio, and Nadav Ahituv
Subjects: Adult, Male, medicine.medical_specialty, Population, Biology, medicine.disease_cause, Cell Line, Cohort Studies, Melanocortin receptor, Thinness, Molecular genetics, Internal medicine, Genetics, medicine, Humans, Genetic Predisposition to Disease, education, Association Studies Article, Molecular Biology, Genetics (clinical), education.field_of_study, Mutation, Case-control study, Computational Biology, General Medicine, Middle Aged, medicine.disease, Obesity, Melanocortin 3 receptor, Obesity, Morbid, Endocrinology, Case-Control Studies, North America, Receptor, Melanocortin, Type 4, Female, Mutant Proteins, Melanocortin, Receptor, Melanocortin, Type 3
Abstract: Functionally significant heterozygous mutations in the Melanocortin-4 receptor (MC4R) have been implicated in 2.5% of early onset obesity cases in European cohorts. The role of mutations in this gene in severely obese adults, particularly in smaller North American patient cohorts, has been less convincing. More recently, it has been proposed that mutations in a phylogenetically and physiologically related receptor, the Melanocortin-3 receptor (MC3R), could also be a cause of severe human obesity. The objectives of this study were to determine if mutations impairing the function of MC4R or MC3R were associated with severe obesity in North American adults. We studied MC4R and MC3R mutations detected in a total of 1821 adults (889 severely obese and 932 lean controls) from two cohorts. We systematically and comparatively evaluated the functional consequences of all mutations found in both MC4R and MC3R. The total prevalence of rare MC4R variants in severely obese North American adults was 2.25% (CI(95%): 1.44-3.47) compared with 0.64% (CI(95%): 0.26-1.43) in lean controls (P0.005). After classification of functional consequence, the prevalence of MC4R mutations with functional alterations was significantly greater when compared with controls (P0.005). In contrast, the prevalence of rare MC3R variants was not significantly increased in severely obese adults [0.67% (CI(95%): 0.27-1.50) versus 0.32% (CI(95%): 0.06-0.99)] (P = 0.332). Our results confirm that mutations in MC4R are a significant cause of severe obesity, extending this finding to North American adults. However, our data suggest that MC3R mutations are not associated with severe obesity in this population.
Published: 2008

40. VarI-SIG 2014 - From SNPs to variants: interpreting different types of genetic variants

Author: Yana Bromberg, Emidio Capriotti, CENTRO INTERDIPARTIMENTALE ALMA MATER RESEARCH INSTITUTE ON GLOBAL CHALLENGES AND CLIMATE CHANGE (ALMA CLIMATE), DIPARTIMENTO DI FARMACIA E BIOTECNOLOGIE, Da definire, AREA MIN. 05 - Scienze biologiche, Bromberg, Yana, and Capriotti, Emidio
Subjects: Genetics, Introduction, dbSNP, business.industry, Research, Genetic variants, Protein Isoform, Single-nucleotide polymorphism, Computational biology, Single Nucleotide, Biology, Polymorphism, Single Nucleotide, Annotation, Humans, Protein Isoforms, Disease, Personalized medicine, Polymorphism, business, Biotechnology, Human
Abstract: The decreasing cost of high-throughput sequencing is still rapidly increasing the number of known genetic variants [1,2]. For example, the size of the dbSNP database [3] has nearly doubled over the past years to 110 million human single nucleotide polymorphisms and short genomic variants. Unfortunately the characterization, annotation, and interpretation of these variants are still lagging. In particular, the interpretation of genetic variants and their implication in disease is one of the major challenges in personalized medicine [4-6]. The 4th edition of the Variant Interpretation Special Interest Group (VarI-SIG, formerly SNP-SIG) meeting [7-9] was held on July 12 at the ISMB 2014 in Boston (MA). The central meeting themes were "Annotation and prediction of structural/functional impacts of coding variants" and "Genetic variants as effectors of change: disease and evolution". The VarI-SIG is organized as a venue for the development of a research network of scientists, necessary for facilitating the exchange of ideas and establishing new collaborations. This year's meeting attracted over 100 participants, with seven research talks and five presentations from the leading scientists of the field.
Published: 2015

41. SNAP predicts effect of mutations on protein function

Author: Yana Bromberg, Guy Yachdav, and Burkhard Rost
Subjects: Statistics and Probability, Biology, medicine.disease_cause, Polymorphism, Single Nucleotide, Biochemistry, medicine, Humans, Single amino acid, Databases, Protein, Molecular Biology, Supplementary data, Genetics, Mutation, Protein function, Extramural, Snap, Computational Biology, Proteins, Amino acid substitution, Computer Science Applications, Applications Note, Computational Mathematics, Amino Acid Substitution, Computational Theory and Mathematics, Sequence Analysis, Software, Neutral mutation
Abstract: Summary: Many non-synonymous single nucleotide polymor-phisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions. Availability: Web-server: http://www.rostlab.org/services/SNAP; downloadable program available upon request. Contact: bromberg@rostlab.org Supplementary information: Supplementary data are available at Bioinformatics online.
Published: 2008

42. Mapping of Mcs30, a New Mammary Carcinoma Susceptibility Quantitative Trait Locus (QTL30) on Rat Chromosome 12: Identification of Fry as a Candidate Mcs Gene

Author: Helmut Zarbl, Xiuling Shang, Cynthia Friedman, Andrea S. Kim, Jenny Pan Lew, Hong Xie, Xuefeng Ren, Lichen Jing, Yuan Gao, Yana Bromberg, Mingzhu Fang, Jessica C. Graham, Graham Vail, and Andrei M. Mikheev
Subjects: Male, Genotype, Sequence analysis, Quantitative Trait Loci, lcsh:Medicine, Mammary Neoplasms, Animal, Locus (genetics), Single-nucleotide polymorphism, Biology, Quantitative trait locus, Polymerase Chain Reaction, 03 medical and health sciences, 0302 clinical medicine, Genetic linkage, Cell Line, Tumor, Animals, Humans, Genetic Predisposition to Disease, lcsh:Science, Gene, Cells, Cultured, In Situ Hybridization, Fluorescence, Chromosome 12, 030304 developmental biology, Synteny, Genetics, 0303 health sciences, Multidisciplinary, lcsh:R, Proteins, Blotting, Northern, Molecular biology, Rats, Inbred F344, Rats, Phenotype, 030220 oncology & carcinogenesis, Female, lcsh:Q, Research Article
Abstract: Rat strains differ dramatically in their susceptibility to mammary carcinogenesis. On the assumption that susceptibility genes are conserved across mammalian species and hence inform human carcinogenesis, numerous investigators have used genetic linkage studies in rats to identify genes responsible for differential susceptibility to carcinogenesis. Using a genetic backcross between the resistant Copenhagen (Cop) and susceptible Fischer 344 (F344) strains, we mapped a novel mammary carcinoma susceptibility (Mcs30) locus to the centromeric region on chromosome 12 (LOD score of ∼8.6 at the D12Rat59 marker). The Mcs30 locus comprises approximately 12 Mbp on the long arm of rat RNO12 whose synteny is conserved on human chromosome 13q12 to 13q13. After analyzing numerous genes comprising this locus, we identified Fry, the rat ortholog of the furry gene of Drosophila melanogaster, as a candidate Mcs gene. We cloned and determined the complete nucleotide sequence of the 13 kbp Fry mRNA. Sequence analysis indicated that the Fry gene was highly conserved across evolution, with 90% similarity of the predicted amino acid sequence among eutherian mammals. Comparison of the Fry sequence in the Cop and F344 strains identified two non-synonymous single nucleotide polymorphisms (SNPs), one of which creates a putative, de novo phosphorylation site. Further analysis showed that the expression of the Fry gene is reduced in a majority of rat mammary tumors. Our results also suggested that FRY activity was reduced in human breast carcinoma cell lines as a result of reduced levels or mutation. This study is the first to identify the Fry gene as a candidate Mcs gene. Our data suggest that the SNPs within the Fry gene contribute to the genetic susceptibility of the F344 rat strain to mammary carcinogenesis. These results provide the foundation for analyzing the role of the human FRY gene in cancer susceptibility and progression.
Published: 2013

43. VarI-SIG 2015: methods for personalized medicine – the role of variant interpretation in research and diagnostics

Author: Hannah Carter, Yana Bromberg, Emidio Capriotti, CENTRO INTERDIPARTIMENTALE ALMA MATER RESEARCH INSTITUTE ON GLOBAL CHALLENGES AND CLIMATE CHANGE (ALMA CLIMATE), DIPARTIMENTO DI FARMACIA E BIOTECNOLOGIE, Bromberg, Yana, Capriotti, Emidio, and Carter, Hannah
Subjects: 0301 basic medicine, dbSNP, Biology, Regulatory Sequences, Nucleic Acid, Polymorphism, Single Nucleotide, Evolution, Molecular, 03 medical and health sciences, Annotation, Alzheimer Disease, Cancer genome, Neoplasms, Genetics, Humans, Precision Medicine, business.industry, Protein Stability, Interpretation (philosophy), Genetic variants, Special Interest Group, Congresses as Topic, Precision medicine, Data science, 030104 developmental biology, Phenotype, Editorial, Genomic Structural Variation, Personalized medicine, Biotechnology, business, Ireland, Protein Kinases
Abstract: The growing availability of high-throughput sequencing continues to increase the number of identified genetic variants [1, 2]. For example, the size of the dbSNP database [3] has grown exponentially over the past years to ~150 million human single nucleotide polymorphisms and short genomic variants. Unfortunately, the characterization, annotation, and interpretation of these variants are still lagging. In particular, their implication in disease is one of the major challenges in personalized medicine [4–8]. The 5th edition of the Variant Interpretation Special Interest Group (VarI-SIG, formerly SNP-SIG) meeting [9–12] was held on July 11th, 2015 at the joint ISMB/ECCB meeting in Dublin, Ireland. The central VarI-SIG themes were “Annotation and prediction of structural/functional impacts of coding variants” and “Genetic variants as effectors of change: disease and evolution”. Our meeting is organized as a venue for the development of a research network of scientists, necessary for facilitating the exchange of ideas and establishing new collaborations. This year’s meeting attracted over 60 participants, with eight research talks and five presentations from the leading scientists in the field.
Full Text: View/download PDF

44. Disease-related mutations predicted to impact protein function

Author: Christian Schaefer, Burkhard Rost, Yana Bromberg, and Dominik Achten
Subjects: Genetics, Mutation, Protein function, Computational Biology, Proteins, Single-nucleotide polymorphism, Functional impact, Disease, Biology, Proteomics, medicine.disease_cause, Polymorphism, Single Nucleotide, Proceedings, Protein sequencing, medicine, Humans, DNA microarray, Biotechnology
Abstract: Background Non-synonymous single nucleotide polymorphisms (nsSNPs) alter the protein sequence and can cause disease. The impact has been described by reliable experiments for relatively few mutations. Here, we study predictions for functional impact of disease-annotated mutations from OMIM, PMD and Swiss-Prot and of variants not linked to disease. Results Most disease-causing mutations were predicted to impact protein function. More surprisingly, the raw predictions scores for disease-causing mutations were higher than the scores for the function-altering data set originally used for developing the prediction method (here SNAP). We might expect that diseases are caused by change-of-function mutations. However, it is surprising how well prediction methods developed for different purposes identify this link. Conversely, our predictions suggest that the set of nsSNPs not currently linked to diseases contains very few strong disease associations to be discovered. Conclusions Firstly, annotations of disease-causing nsSNPs are on average so reliable that they can be used as proxies for functional impact. Secondly, disease-causing nsSNPs can be identified very well by methods that predict the impact of mutations on protein function. This implies that the existing prediction methods provide a very good means of choosing a set of suspect SNPs relevant for disease.
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

44 results on '"Yana Bromberg"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources