Descriptor: "Human Phenotype Ontology" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Human Phenotype Ontology"' showing total 598 results

Start Over Descriptor "Human Phenotype Ontology"

598 results on '"Human Phenotype Ontology"'

1. An evaluation of GPT models for phenotype concept recognition.

Author: Groza, Tudor, Caufield, Harry, Gration, Dylan, Baynam, Gareth, Haendel, Melissa, Robinson, Peter, Mungall, Christopher, and Reese, Justin
Subjects: Artificial intelligence, Generative pretrained transformer, Human Phenotype Ontology, Large language models, Phenotype concept recognition, Humans, Knowledge, Language, Machine Learning, Phenotype, Rare Diseases
Abstract: OBJECTIVE: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS: The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS: The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION: Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Published: 2024

2. GA4GH Phenopackets: A Practical Introduction

Author: Ladewig, Markus S, Jacobsen, Julius OB, Wagner, Alex H, Danis, Daniel, Kassaby, Baha El, Gargano, Michael, Groza, Tudor, Baudis, Michael, Steinhaus, Robin, Seelow, Dominik, Bechrakis, Nikolaos E, Mungall, Christopher J, Schofield, Paul N, Elemento, Olivier, Smith, Lindsay, McMurry, Julie A, Munoz‐Torres, Monica, Haendel, Melissa A, and Robinson, Peter N
Subjects: Biological Sciences, Biomedical and Clinical Sciences, Genetics, Health Services and Systems, Health Sciences, Biotechnology, Networking and Information Technology R&D (NITRD), Rare Diseases, Human Genome, Cancer, Generic health relevance, Good Health and Well Being, FAIR data, Global Alliance for Genomics and Health, Human Phenotype Ontology, Phenopacket Schema, deep phenotyping
Abstract: The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.
Published: 2023

3. A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery

Author: Daniel Danis, Michael J. Bamshad, Yasemin Bridges, Andrés Caballero-Oteyza, Pilar Cacheiro, Leigh C. Carmody, Leonardo Chimirri, Jessica X. Chong, Ben Coleman, Raymond Dalgleish, Peter J. Freeman, Adam S.L. Graefe, Tudor Groza, Peter Hansen, Julius O.B. Jacobsen, Adam Klocperk, Maaike Kusters, Markus S. Ladewig, Anthony J. Marcello, Teresa Mattina, Christopher J. Mungall, Monica C. Munoz-Torres, Justin T. Reese, Filip Rehburg, Bárbara C.S. Reis, Catharina Schuetz, Damian Smedley, Timmy Strauss, Jagadish Chandrabose Sundaramurthi, Sylvia Thun, Kyran Wissink, John F. Wagstaff, David Zocche, Melissa A. Haendel, and Peter N. Robinson
Subjects: human phenotype ontology, global alliance for genomics and health, phenopacket schema, Genetics, QH426-470
Abstract: Summary: The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Published: 2025
Full Text: View/download PDF

4. Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmes

Author: Reese, Justin T, Blau, Hannah, Casiraghi, Elena, Bergquist, Timothy, Loomba, Johanna J, Callahan, Tiffany J, Laraway, Bryan, Antonescu, Corneliu, Coleman, Ben, Gargano, Michael, Wilkins, Kenneth J, Cappelletti, Luca, Fontana, Tommaso, Ammar, Nariman, Antony, Blessy, Murali, TM, Caufield, J Harry, Karlebach, Guy, McMurry, Julie A, Williams, Andrew, Moffitt, Richard, Banerjee, Jineta, Solomonides, Anthony E, Davis, Hannah, Kostka, Kristin, Valentini, Giorgio, Sahner, David, Chute, Christopher G, Madlock-Brown, Charisse, Haendel, Melissa A, Robinson, Peter N, Consortium, N3C, Spratt, Heidi, Visweswaran, Shyam, Flack, Joseph Eugene, Yoo, Yun Jae, Gabriel, Davera, Alexander, G Caleb, Mehta, Hemalkumar B, Liu, Feifan, Miller, Robert T, Wong, Rachel, Hill, Elaine L, Consortium, RECOVER, Thorpe, Lorna E, and Divers, Jasmin
Subjects: Biomedical and Clinical Sciences, Clinical Sciences, Infectious Diseases, Precision Medicine, Coronaviruses, Emerging Infectious Diseases, Machine Learning and Artificial Intelligence, Networking and Information Technology R&D (NITRD), Good Health and Well Being, Humans, COVID-19, Disease Progression, Post-Acute COVID-19 Syndrome, SARS-CoV-2, N3C Consortium, RECOVER Consortium, Human Phenotype Ontology, Long COVID, Machine learning, Precision medicine, Semantic similarity, Public Health and Health Services, Clinical sciences, Epidemiology
Abstract: BackgroundStratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.MethodsWe present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning.FindingsWe found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.InterpretationSemantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.FundingNIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
Published: 2023

5. Objectivizing issues in the diagnosis of complex rare diseases: lessons learned from testing existing diagnosis support systems on ciliopathies.

Author: Faviez, Carole, Chen, Xiaoyi, Garcelon, Nicolas, Zaidan, Mohamad, Billot, Katy, Petzold, Friederike, Faour, Hassan, Douillet, Maxime, Rozet, Jean-Michel, Cormier-Daire, Valérie, Attié-Bitach, Tania, Lyonnet, Stanislas, Saunier, Sophie, and Burgun, Anita
Subjects: *RARE diseases, *DELAYED diagnosis, *DIAGNOSIS, *HUMAN phenotype, *ELECTRONIC health records
Abstract: Background: There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. Methods: Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. Results: A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. Conclusion: Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Genetic variant reanalysis reveals a case of Sandhoff disease with onset of infantile epileptic spasm syndrome.

Author: Zhang, Qi, Zou, Liping, Lu, Qian, Wang, Qiuhong, Dun, Shuo, and Wang, Jing
Subjects: SPHINGOLIPIDOSES, INFANTILE spasms, ELECTROENCEPHALOGRAPHY, TREATMENT effectiveness, BIOINFORMATICS, GENETIC mutation, GENETIC testing, PHENOTYPES, VIDEO recording
Abstract: Background: Sandhoff disease (SD) i s an autosomal recessive lysosomal disease with clinical manifestations such as epilepsy, psychomotor retardation and developmental delay. However, infantile SD with onset of infantile epilepsy spasm syndrome (IESS) is extremely rare. Case presentation: The case presented here was a 22-month-old boy, who presented with IESS and psychomotor retardation/regression at 6 months of age. The patient showed progressive aggravation of seizures and excessive startle responses. The whole exome sequencing data, which initially revealed negative results, were reanalyzed and indicated a homozygous mutation at the c.1613 + 4del splice site of the HEXB gene. The activities of β-hexosaminidase A and total hexosaminidase were significantly decreased. The fundus examination showed cherry red spots at the macula. Conclusions: IESS can be an epileptic phenotype of infantile SD. Clinical phenotypes should be adequately collected in genetic testing. In the case of negative sequencing results, gene variant reanalysis can be performed when the patients show clinically suspicious indications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. An evaluation of GPT models for phenotype concept recognition

Author: Tudor Groza, Harry Caufield, Dylan Gration, Gareth Baynam, Melissa A. Haendel, Peter N. Robinson, Christopher J. Mungall, and Justin T. Reese
Subjects: Large language models, Generative pretrained transformer, Artificial intelligence, Phenotype concept recognition, Human Phenotype Ontology, Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Abstract Objective Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. Materials and methods The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. Results The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. Conclusion Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Published: 2024
Full Text: View/download PDF

8. Prenatal phenotyping: A community effort to enhance the Human Phenotype Ontology

Author: Dhombres, Ferdinand, Morgan, Patricia, Chaudhari, Bimal P, Filges, Isabel, Sparks, Teresa N, Lapunzina, Pablo, Roscioli, Tony, Agarwal, Umber, Aggarwal, Shagun, Beneteau, Claire, Cacheiro, Pilar, Carmody, Leigh C, Collardeau‐Frachon, Sophie, Dempsey, Esther A, Dufke, Andreas, Duyzend, Michael Henri, Ghosh, Mirna, Giordano, Jessica L, Glad, Ragnhild, Grinfelde, Ieva, Iliescu, Dominic G, Ladewig, Markus S, Munoz‐Torres, Monica C, Pollazzon, Marzia, Radio, Francesca Clementina, Rodo, Carlota, Silva, Raquel Gouveia, Smedley, Damian, Sundaramurthi, Jagadish Chandrabose, Toro, Sabrina, Valenzuela, Irene, Vasilevsky, Nicole A, Wapner, Ronald J, Zemet, Roni, Haendel, Melissa A, and Robinson, Peter N
Subjects: Genetics, Human Genome, Clinical Research, Congenital Structural Anomalies, Prevention, Pediatric, Perinatal Period - Conditions Originating in Perinatal Period, Detection, screening and diagnosis, 4.1 Discovery and preclinical testing of markers and technologies, Neurological, Reproductive health and childbirth, Good Health and Well Being, Infant, Newborn, Humans, Female, Pregnancy, Placenta, Computational Biology, Phenotype, Rare Diseases, Exome Sequencing, HPO, human phenotype ontology, GA4GH Phenopacket, prenatal diagnosis, fetal pathology, prenatal phenotyping, Clinical Sciences, Genetics & Heredity
Abstract: Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Published: 2022

9. Systematising and scaling literature curation for genetically determined developmental disorders

Author: Yates, Thabo Michael, Fitzpatrick, David, and Simpson, Ian
Subjects: genetically determined developmental disorders, genomic sequencing, GDD, Phenotypic data, Human Phenotype Ontology, HPO
Abstract: The widespread availability of genomic sequencing has transformed the diagnosis of genetically-determined developmental disorders (GDD). However, this type of test often generates a number of genetic variants, which have to be reviewed and related back to the clinical features (phenotype) of the individual being tested. This frequently entails a time-consuming review of the peer-reviewed literature to look for case reports describing variants in the gene(s) of interest. This is particularly true for newly described and/or very rare disorders not covered in phenotype databases. Therefore, there is a need for scalable, automated literature curation to increase the efficiency of this process. This should lead to improvements in the speed in which diagnosis is made, and an increase in the number of individuals who are diagnosed through genomic testing. Phenotypic data in case reports/case series is not usually recorded in a standardised, computationally-tractable format. Plain text descriptions of similar clinical features may be recorded in several different ways. For example, a technical term such as 'hypertelorism', may be recorded as its synonym 'widely spaced eyes'. In addition, case reports are found across a wide range of journals, with different structures and file formats for each publication. The Human Phenotype Ontology (HPO) was developed to store phenotypic data in a computationally-accessible format. Several initiatives have been developed to link diseases to phenotype data, in the form of HPO terms. However, these rely on manual expert curation and therefore are not inherently scalable, and cannot be updated automatically. Methods of extracting phenotype data from text at scale developed to date have relied on abstracts or open access papers. At the time of writing, Europe PubMed Central (EPMC, https://europepmc.org/) contained approximately 39.5 million articles, of which only 3.8 million were open access. Therefore, there is likely a significant volume of phenotypic data which has not been used previously at scale, due to difficulties accessing non-open access manuscripts. In this thesis, I present a method for literature curation which can utilise all relevant published full text through a newly developed package which can download almost all manuscripts licenced by a university or other institution. This is scalable to the full spectrum of GDD. Using manuscripts identified through manual literature review, I use a full text download pipeline and NLP (natural language processing) based methods to generate disease models. These are comprised of HPO terms weighted according to their frequency in the literature. I demonstrate iterative refinement of these models, and use a custom annotated corpus of 50 papers to show the text mining process has high precision and recall. I demonstrate that these models clinically reflect true disease expressivity, as defined by manual comparison with expert literature reviews, for three well-characterised GDD. I compare these disease models to those in the most commonly used genetic disease phenotype databases. I show that the automated disease models have increased depth of phenotyping, i.e. there are more terms than those which are manually-generated. I show that, in comparison to 'real life' prospectively gathered phenotypic data, automated disease models outperform existing phenotype databases in predicting diagnosis, as defined by increased area under the curve (by 0.05 and 0.08 using different similarity measures) on ROC curve plots. I present a method for automated PubMed search at scale, to use as input for disease model generation. I annotated a corpus of 6500 abstracts. Using this corpus I show a high precision (up to 0.80) and recall (up to 1.00) for machine learning classifiers used to identify manuscripts relevant to GDD. These use hand-picked domain-specific features, for example utilising specific MeSH terms. This method can be used to scale automated literature curation to the full spectrum of GDD. I also present an analysis of the phenotypic terms used in one year of GDD-relevant papers in a prominent journal. This shows that use of supplemental data and parsing clinical report sections from manuscripts is likely to result in more patient-specific phenotype extraction in future. In summary, I present a method for automated curation of full text from the peer-reviewed literature in the context of GDD. I demonstrate that this method is robust, reflects clinical disease expressivity, outperforms existing manual literature curation, and is scalable. Applying this process to clinical testing in future should improve the efficiency and accuracy of diagnosis.
Published: 2022
Full Text: View/download PDF

10. Individualised human phenotype ontology gene panels improve clinical whole exome and genome sequencing analytical efficacy in a cohort of developmental and epileptic encephalopathies.

Author: Henry, Olivia J., Stödberg, Tommy, Båtelson, Sofia, Rasi, Chiara, Stranneheim, Henrik, and Wedell, Anna
Subjects: *NUCLEOTIDE sequencing, *WHOLE genome sequencing, *HUMAN phenotype, *GENOMICS, *GENE ontology, *PEOPLE with epilepsy
Abstract: Background: The majority of genetic epilepsies remain unsolved in terms of specific genotype. Phenotype‐based genomic analyses have shown potential to strengthen genomic analysis in various ways, including improving analytical efficacy. Methods: We have tested a standardised phenotyping method termed 'Phenomodels' for integrating deep‐phenotyping information with our in‐house developed clinical whole exome/genome sequencing analytical pipeline. Phenomodels includes a user‐friendly epilepsy phenotyping template and an objective measure for selecting which template terms to include in individualised Human Phenotype Ontology (HPO) gene panels. In a pilot study of 38 previously solved cases of developmental and epileptic encephalopathies, we compared the sensitivity and specificity of the individualised HPO gene panels with the clinical epilepsy gene panel. Results: The Phenomodels template showed high sensitivity for capturing relevant phenotypic information, where 37/38 individuals' HPO gene panels included the causative gene. The HPO gene panels also had far fewer variants to assess than the epilepsy gene panel. Conclusion: We have demonstrated a viable approach for incorporating standardised phenotype information into clinical genomic analyses, which may enable more efficient analysis. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Semantic Similarity Analysis Reveals Robust Gene-Disease Relationships in Developmental and Epileptic Encephalopathies

Author: Galer, Peter D, Ganesan, Shiva, Lewis-Smith, David, McKeown, Sarah E, Pendziwiat, Manuela, Helbig, Katherine L, Ellis, Colin A, Rademacher, Annika, Smith, Lacey, Poduri, Annapurna, Seiffert, Simone, von Spiczak, Sarah, Muhle, Hiltrud, van Baalen, Andreas, Group, NCEE Study, Investigators, EPGP, Consortium, EuroEPINOMICS-RES, Network, Genomics Research and Innovation, Thomas, Rhys H, Krause, Roland, Weber, Yvonne, and Helbig, Ingo
Subjects: Biological Sciences, Genetics, Neurodegenerative, Epilepsy, Prevention, Neurosciences, Brain Disorders, Pediatric, Aetiology, 2.1 Biological and endogenous factors, Good Health and Well Being, Child, Preschool, Cohort Studies, Female, GABA Plasma Membrane Transport Proteins, Gene Expression, Gene Ontology, Humans, Male, Munc18 Proteins, Mutation, NAV1.1 Voltage-Gated Sodium Channel, Phenotype, Seizures, Semantics, Shab Potassium Channels, Spasms, Infantile, Speech Disorders, Terminology as Topic, Exome Sequencing, NCEE Study Group, EPGP Investigators, EuroEPINOMICS-RES Consortium, Genomics Research and Innovation Network, Human Phenotype Ontology, childhood epilepsies, computational phenotypes, developmental and epileptic encephalopathies, electronic medical records, neurogenetic disorders, whole-exome sequencing, Medical and Health Sciences, Genetics & Heredity, Biological sciences, Biomedical and clinical sciences, Health sciences
Abstract: More than 100 genetic etiologies have been identified in developmental and epileptic encephalopathies (DEEs), but correlating genetic findings with clinical features at scale has remained a hurdle because of a lack of frameworks for analyzing heterogenous clinical data. Here, we analyzed 31,742 Human Phenotype Ontology (HPO) terms in 846 individuals with existing whole-exome trio data and assessed associated clinical features and phenotypic relatedness by using HPO-based semantic similarity analysis for individuals with de novo variants in the same gene. Gene-specific phenotypic signatures included associations of SCN1A with "complex febrile seizures" (HP: 0011172; p = 2.1 × 10-5) and "focal clonic seizures" (HP: 0002266; p = 8.9 × 10-6), STXBP1 with "absent speech" (HP: 0001344; p = 1.3 × 10-11), and SLC6A1 with "EEG with generalized slow activity" (HP: 0010845; p = 0.018). Of 41 genes with de novo variants in two or more individuals, 11 genes showed significant phenotypic similarity, including SCN1A (n = 16, p < 0.0001), STXBP1 (n = 14, p = 0.0021), and KCNB1 (n = 6, p = 0.011). Including genetic and phenotypic data of control subjects increased phenotypic similarity for all genetic etiologies, whereas the probability of observing de novo variants decreased, emphasizing the conceptual differences between semantic similarity analysis and approaches based on the expected number of de novo events. We demonstrate that HPO-based phenotype analysis captures unique profiles for distinct genetic etiologies, reflecting the breadth of the phenotypic spectrum in genetic epilepsies. Semantic similarity can be used to generate statistical evidence for disease causation analogous to the traditional approach of primarily defining disease entities through similar clinical features.
Published: 2020

12. An Improved Phenotype-Driven Tool for Rare Mendelian Variant Prioritization: Benchmarking Exomiser on Real Patient Whole-Exome Data.

Author: Cipriani, Valentina, Pontikos, Nikolas, Arno, Gavin, Sergouniotis, Panagiotis I, Lenassi, Eva, Thawong, Penpitcha, Danis, Daniel, Michaelides, Michel, Webster, Andrew R, Moore, Anthony T, Robinson, Peter N, Jacobsen, Julius OB, and Smedley, Damian
Subjects: bioinformatics, human phenotype ontology, inherited retinal disease, phenotypic similarity, rare disease, variant prioritization, whole-exome sequencing, whole-genome sequencing, Genetics
Abstract: Next-generation sequencing has revolutionized rare disease diagnostics, but many patients remain without a molecular diagnosis, particularly because many candidate variants usually survive despite strict filtering. Exomiser was launched in 2014 as a Java tool that performs an integrative analysis of patients' sequencing data and their phenotypes encoded with Human Phenotype Ontology (HPO) terms. It prioritizes variants by leveraging information on variant frequency, predicted pathogenicity, and gene-phenotype associations derived from human diseases, model organisms, and protein-protein interactions. Early published releases of Exomiser were able to prioritize disease-causative variants as top candidates in up to 97% of simulated whole-exomes. The size of the tested real patient datasets published so far are very limited. Here, we present the latest Exomiser version 12.0.1 with many new features. We assessed the performance using a set of 134 whole-exomes from patients with a range of rare retinal diseases and known molecular diagnosis. Using default settings, Exomiser ranked the correct diagnosed variants as the top candidate in 74% of the dataset and top 5 in 94%; not using the patients' HPO profiles (i.e., variant-only analysis) decreased the performance to 3% and 27%, respectively. In conclusion, Exomiser is an effective support tool for rare Mendelian phenotype-driven variant prioritization.
Published: 2020

13. Individualised human phenotype ontology gene panels improve clinical whole exome and genome sequencing analytical efficacy in a cohort of developmental and epileptic encephalopathies

Author: Olivia J. Henry, Tommy Stödberg, Sofia Båtelson, Chiara Rasi, Henrik Stranneheim, and Anna Wedell
Subjects: deep phenotyping, epilepsy, human phenotype ontology, precision medicine, whole exome sequencing, whole genome sequencing, Genetics, QH426-470
Abstract: Abstract Background The majority of genetic epilepsies remain unsolved in terms of specific genotype. Phenotype‐based genomic analyses have shown potential to strengthen genomic analysis in various ways, including improving analytical efficacy. Methods We have tested a standardised phenotyping method termed ‘Phenomodels’ for integrating deep‐phenotyping information with our in‐house developed clinical whole exome/genome sequencing analytical pipeline. Phenomodels includes a user‐friendly epilepsy phenotyping template and an objective measure for selecting which template terms to include in individualised Human Phenotype Ontology (HPO) gene panels. In a pilot study of 38 previously solved cases of developmental and epileptic encephalopathies, we compared the sensitivity and specificity of the individualised HPO gene panels with the clinical epilepsy gene panel. Results The Phenomodels template showed high sensitivity for capturing relevant phenotypic information, where 37/38 individuals' HPO gene panels included the causative gene. The HPO gene panels also had far fewer variants to assess than the epilepsy gene panel. Conclusion We have demonstrated a viable approach for incorporating standardised phenotype information into clinical genomic analyses, which may enable more efficient analysis.
Published: 2023
Full Text: View/download PDF

14. Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics

Author: Köhler, Sebastian, Øien, N Christine, Buske, Orion J, Groza, Tudor, Jacobsen, Julius OB, McNamara, Craig, Vasilevsky, Nicole, Carmody, Leigh C, Gourdine, JP, Gargano, Michael, McMurry, Julie A, Danis, Daniel, Mungall, Christopher J, Smedley, Damian, Haendel, Melissa, and Robinson, Peter N
Subjects: Human Genome, Clinical Research, Genetics, Networking and Information Technology R&D (NITRD), 4.1 Discovery and preclinical testing of markers and technologies, Detection, screening and diagnosis, Biological Ontologies, Computational Biology, Databases, Genetic, Diagnosis, Differential, Exome, Genetic Diseases, Inborn, Humans, Phenotype, Software, Whole Genome Sequencing, HPO, Human Phenotype Ontology, differential diagnosis, exome, phenotype, Genetics & Heredity
Abstract: The Human Phenotype Ontology (HPO) is a standardized set of phenotypic terms that are organized in a hierarchical fashion. It is a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics. The HPO is frequently used to create a set of terms that accurately describe the observed clinical abnormalities of an individual being evaluated for suspected rare genetic disease. This profile is compared with computational disease profiles in the HPO database with the aim of identifying genetic diseases with comparable phenotypic profiles. The computational analysis can be coupled with the analysis of whole-exome or whole-genome sequencing data through applications such as Exomiser. This article explains how to choose an optimal set of HPO terms for these cases and enter them with software, such as PhenoTips and PatientArchive, and demonstrates how to use Phenomizer and Exomiser to generate a computational differential diagnosis. © 2019 by John Wiley & Sons, Inc.
Published: 2019

15. A Recurrent Missense Variant in AP2M1 Impairs Clathrin-Mediated Endocytosis and Causes Developmental and Epileptic Encephalopathy.

Author: Helbig, Ingo, Lopez-Hernandez, Tania, Shor, Oded, Galer, Peter, Ganesan, Shiva, Pendziwiat, Manuela, Rademacher, Annika, Ellis, Colin, Hümpfer, Nadja, Schwarz, Niklas, Seiffert, Simone, Peeden, Joseph, Štěrbová, Katalin, Hammer, Trine, Møller, Rikke, Shinde, Deepali, Tang, Sha, Smith, Lacey, Poduri, Annapurna, Krause, Roland, Benninger, Felix, Helbig, Katherine, Haucke, Volker, Weber, Yvonne, and Shen, Joseph
Subjects: Human Phenotype Ontology, clathrin-mediated endocytosis, computational phenotypes, developmental and epileptic encephalopathy, neurodevelopmental disorders, synaptic transmission, Adaptor Protein Complex 2, Adaptor Protein Complex mu Subunits, Adolescent, Animals, Brain Diseases, Child, Child, Preschool, Clathrin, Endocytosis, Epilepsy, Female, Humans, Infant, Mice, Mice, Knockout, Mutation, Missense, Neurodevelopmental Disorders, Exome Sequencing
Abstract: The developmental and epileptic encephalopathies (DEEs) are heterogeneous disorders with a strong genetic contribution, but the underlying genetic etiology remains unknown in a significant proportion of individuals. To explore whether statistical support for genetic etiologies can be generated on the basis of phenotypic features, we analyzed whole-exome sequencing data and phenotypic similarities by using Human Phenotype Ontology (HPO) in 314 individuals with DEEs. We identified a de novo c.508C>T (p.Arg170Trp) variant in AP2M1 in two individuals with a phenotypic similarity that was higher than expected by chance (p = 0.003) and a phenotype related to epilepsy with myoclonic-atonic seizures. We subsequently found the same de novo variant in two individuals with neurodevelopmental disorders and generalized epilepsy in a cohort of 2,310 individuals who underwent diagnostic whole-exome sequencing. AP2M1 encodes the μ-subunit of the adaptor protein complex 2 (AP-2), which is involved in clathrin-mediated endocytosis (CME) and synaptic vesicle recycling. Modeling of protein dynamics indicated that the p.Arg170Trp variant impairs the conformational activation and thermodynamic entropy of the AP-2 complex. Functional complementation of both the μ-subunit carrying the p.Arg170Trp variant in human cells and astrocytes derived from AP-2μ conditional knockout mice revealed a significant impairment of CME of transferrin. In contrast, stability, expression levels, membrane recruitment, and localization were not impaired, suggesting a functional alteration of the AP-2 complex as the underlying disease mechanism. We establish a recurrent pathogenic variant in AP2M1 as a cause of DEEs with distinct phenotypic features, and we implicate dysfunction of the early steps of endocytosis as a disease mechanism in epilepsy.
Published: 2019

16. GA4GH Phenopackets: A Practical Introduction

Author: Markus S. Ladewig, Julius O. B. Jacobsen, Alex H. Wagner, Daniel Danis, Baha El Kassaby, Michael Gargano, Tudor Groza, Michael Baudis, Robin Steinhaus, Dominik Seelow, Nikolaos E. Bechrakis, Christopher J. Mungall, Paul N. Schofield, Olivier Elemento, Lindsay Smith, Julie A. McMurry, Monica Munoz‐Torres, Melissa A. Haendel, and Peter N. Robinson
Subjects: deep phenotyping, FAIR data, Global Alliance for Genomics and Health, Human Phenotype Ontology, Phenopacket Schema, Genetics, QH426-470
Abstract: Abstract The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.
Published: 2023
Full Text: View/download PDF

17. Clinical free text to HPO codes

Author: Gabrielle Stinton, Jane A. Lieviant, Sylvia Kam, Jiin Ying Lim, Jasmine Chew-Yin Goh, Weng Khong Lim, Gareth Baynam, Tele Tan, Duc-Son Pham, and Saumya Shekhar Jamuar
Subjects: Rare disease, Human phenotype ontology, Phenotypic concept extraction, Named entity recognition, Human-in-the-loop, Medicine, Genetics, QH426-470
Abstract: Leveraging Artificial Intelligence (AI) within the rare disease diagnostic odyssey can facilitate a decrease in diagnostic times and an increase in diagnostic rates. Among the steps involved in the odyssey, this project focused on utilizing AI to automate the standardized capturing of clinical free text into Human Phenotype Ontology (HPO) codes. This research project was conducted at both the KK Women’s and Children’s Hospital (KKH), Singapore and the Rare Care Centre at Perth Children’s Hospital, Western Australia (WA), via the Curtin New Colombo Plan (NCP) Scholarship. The outcome of the project saw the development of a Streamlit web application that utilized two (2) pre-trained AI models – PhenoTagger and PhenoBERT – with a human-in-the-loop design. A case study conducted with ten (10) de-identified clinical reports demonstrated a reduction in the HPO extraction task time from ten (10) to twenty (20) minutes per report to less than five (5) minutes.
Published: 2023
Full Text: View/download PDF

18. Generalisable long COVID subtypes: Findings from the NIH N3C and RECOVER programmesResearch in context

Author: Justin T. Reese, Hannah Blau, Elena Casiraghi, Timothy Bergquist, Johanna J. Loomba, Tiffany J. Callahan, Bryan Laraway, Corneliu Antonescu, Ben Coleman, Michael Gargano, Kenneth J. Wilkins, Luca Cappelletti, Tommaso Fontana, Nariman Ammar, Blessy Antony, T.M. Murali, J. Harry Caufield, Guy Karlebach, Julie A. McMurry, Andrew Williams, Richard Moffitt, Jineta Banerjee, Anthony E. Solomonides, Hannah Davis, Kristin Kostka, Giorgio Valentini, David Sahner, Christopher G. Chute, Charisse Madlock-Brown, Melissa A. Haendel, Peter N. Robinson, Heidi Spratt, Shyam Visweswaran, Joseph Eugene Flack, IV, Yun Jae Yoo, Davera Gabriel, G. Caleb Alexander, Hemalkumar B. Mehta, Feifan Liu, Robert T. Miller, Rachel Wong, Elaine L. Hill, Lorna E. Thorpe, and Jasmin Divers
Subjects: Long COVID, COVID-19, Semantic similarity, Machine learning, Precision medicine, Human Phenotype Ontology, Medicine, Medicine (General), R5-920
Abstract: Summary: Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
Published: 2023
Full Text: View/download PDF

19. IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders

Author: Stuart Aitken, Helen V. Firth, Caroline F. Wright, Matthew E. Hurles, David R. FitzPatrick, and Colin A. Semple
Subjects: human phenotype ontology, phenotype, genotype, developmental disease, growth, developmental milestones, Genetics, QH426-470
Abstract: Summary: Diagnosing rare developmental disorders using genome-wide sequencing data commonly necessitates review of multiple plausible candidate variants, often using ontologies of categorical clinical terms. We show that Integrating Multiple Phenotype Resources Optimizes Variant Evaluation in Developmental Disorders (IMPROVE-DD) by incorporating additional classes of data commonly available to clinicians and recorded in health records. In doing so, we quantify the distinct contributions of sex, growth, and development in addition to Human Phenotype Ontology (HPO) terms and demonstrate added value from these readily available information sources. We use likelihood ratios for nominal and quantitative data and propose a classifier for HPO terms in this framework. This Bayesian framework results in more robust diagnoses. Using data systematically collected in the Deciphering Developmental Disorders study, we considered 77 genes with pathogenic/likely pathogenic variants in ≥10 individuals. All genes showed at least a satisfactory prediction by receiver operating characteristic when testing on training data (AUC ≥ 0.6), and HPO terms were the best predictor for the majority of genes, though a minority (13/77) of genes were better predicted by other phenotypic data types. Overall, classifiers based upon multiple integrated phenotypic data sources performed better than those based upon any individual source, and importantly, integrated models produced notably fewer false positives. Finally, we show that IMPROVE-DD models with good predictive performance on cross-validation can be constructed from relatively few individuals. This suggests new strategies for candidate gene prioritization and highlights the value of systematic clinical data collection to support diagnostic programs.
Published: 2023
Full Text: View/download PDF

20. Seltene Erkrankungen in den Daten sichtbar machen – Kodierung.

Author: Martin, Tamara, Rommel, Kathrin, Thomas, Carina, Eymann, Jutta, Kretschmer, Tanita, Berner, Reinhard, Lee-Kirsch, Min Ae, and Hebestreit, Helge
Abstract: Copyright of Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2022
Full Text: View/download PDF

21. Computational analysis of neurodevelopmental phenotypes: Harmonization empowers clinical discovery.

Author: Lewis‐Smith, David, Parthasarathy, Shridhar, Xian, Julie, Kaufman, Michael C., Ganesan, Shiva, Galer, Peter D., Thomas, Rhys H., and Helbig, Ingo
Abstract: Making a specific diagnosis in neurodevelopmental disorders is traditionally based on recognizing clinical features of a distinct syndrome, which guides testing of its possible genetic etiologies. Scalable frameworks for genomic diagnostics, however, have struggled to integrate meaningful measurements of clinical phenotypic features. While standardization has enabled generation and interpretation of genomic data for clinical diagnostics at unprecedented scale, making the equivalent breakthrough for clinical data has proven challenging. However, increasingly clinical features are being recorded using controlled dictionaries with machine readable formats such as the Human Phenotype Ontology (HPO), which greatly facilitates their use in the diagnostic space. Improving the tractability of large‐scale clinical information will present new opportunities to inform genomic research and diagnostics from a clinical perspective. Here, we describe novel approaches for computational phenotyping to harmonize clinical features, improve data translation through revising domain‐specific dictionaries, quantify phenotypic features, and determine clinical relatedness. We demonstrate how these concepts can be applied to longitudinal phenotypic information, which represents a critical element of developmental disorders and pediatric conditions. Finally, we expand our discussion to clinical data derived from electronic medical records, a largely untapped resource of deep clinical information with distinct strengths and weaknesses. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. OARD: Open annotations for rare diseases and their phenotypes based on real-world data.

Author: Liu, Cong, Ta, Casey N., Havrilla, Jim M., Nestor, Jordan G., Spotnitz, Matthew E., Geneslaw, Andrew S., Hu, Yu, Chung, Wendy K., Wang, Kai, and Weng, Chunhua
Subjects: *RARE diseases, *ELECTRONIC health records, *HEALTH facilities, *ANNOTATIONS, *GENETIC disorders, *KNOWLEDGE base, *CO-sleeping
Abstract: Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community. The open annotation for rare diseases (OARD) is a publicly accessible, data-driven resource containing summary statistics including phenotype frequencies and associations for rare diseases. It was derived from over 10 million individuals' electronic health records from two academic health institutions, which span wide age ranges and different disease subgroups. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. The clinical and genetic spectrum of paediatric speech and language disorders.

Author: Magielski JH, Ruggiero SM, Xian J, Parthasarathy S, Galer PD, Ganesan S, Back A, McKee JL, McSalley I, Gonzalez AK, Morgan A, Donaher J, and Helbig I
Abstract: Speech and language disorders are known to have a substantial genetic contribution. Although frequently examined as components of other conditions, research on the genetic basis of linguistic differences as separate phenotypic subgroups has been limited so far. Here, we performed an in-depth characterization of speech and language disorders in 52 143 individuals, reconstructing clinical histories using a large-scale data-mining approach of the electronic medical records from an entire large paediatric healthcare network. The reported frequency of these disorders was the highest between 2 and 5 years old and spanned a spectrum of 26 broad speech and language diagnoses. We used natural language processing to assess the degree to which clinical diagnoses in full-text notes were reflected in ICD-10 diagnosis codes. We found that aphasia and speech apraxia could be retrieved easily through ICD-10 diagnosis codes, whereas stuttering as a speech phenotype was coded in only 12% of individuals through appropriate ICD-10 codes. We found significant comorbidity of speech and language disorders in neurodevelopmental conditions (30.31%) and, to a lesser degree, with epilepsies (6.07%) and movement disorders (2.05%). The most common genetic disorders retrievable in our analysis of electronic medical records were STXBP1 (n = 21), PTEN (n = 20) and CACNA1A (n = 18). When assessing associations of genetic diagnoses with specific linguistic phenotypes, we observed associations of STXBP1 and aphasia (P = 8.57 × 10-7, 95% confidence interval = 18.62-130.39) and MYO7A with speech and language development delay attributable to hearing loss (P = 1.24 × 10-5, 95% confidence interval = 17.46-infinity). Finally, in a sub-cohort of 726 individuals with whole-exome sequencing data, we identified an enrichment of rare variants in neuronal receptor pathways, in addition to associations of UQCRC1 and KIF17 with expressive aphasia, MROH8 and BCHE with poor speech, and USP37, SLC22A9 and UMODL1 with aphasia. In summary, our study outlines the landscape of paediatric speech and language disorders, confirming the phenotypic complexity of linguistic traits and novel genotype-phenotype associations. Subgroups of paediatric speech and language disorders differ significantly with respect to the composition of monogenic aetiologies., (© The Author(s) 2024. Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.)
Published: 2024
Full Text: View/download PDF

24. Leveraging Clinical Intuition to Improve Accuracy of Phenotype-Driven Prioritization.

Author: Beckwith MA, Danis D, Bridges Y, Jacobsen JOB, Smedley D, and Robinson PN
Abstract: Purpose: Clinical intuition is commonly incorporated into the differential diagnosis as an assessment of the likelihood of candidate diagnoses based either on the patient population being seen in a specific clinic or on the signs and symptoms of the initial presentation. Algorithms to support diagnostic sequencing in individuals with a suspected rare genetic disease do not yet incorporate intuition and instead assume that each Mendelian disease has an equal pretest probability., Methods: The LIRICAL algorithm calculates the likelihood ratio of clinical manifestations represented by Human Phenotype Ontology (HPO) terms to rank candidate diagnoses. The initial version of LIRICAL assumed an equal pretest probability for each disease in its calculation of the posttest probability (where the test is diagnostic exome or genome sequencing). We introduce Clinical Intuition for Likelihood Ratios (ClintLR), an extension of the LIRICAL algorithm that boosts the pretest probability of groups of related diseases deemed to be more likely., Results: The average rank of the correct diagnosis in simulations using ClintLR showed a statistically significant improvement over a range of adjustment factors., Conclusion: ClintLR successfully encodes clinical intuition to improve ranking of rare diseases in diagnostic sequencing. ClintLR is freely available at https://github.com/TheJacksonLaboratory/ClintLR., (Copyright © 2024. Published by Elsevier Inc.)
Published: 2024
Full Text: View/download PDF

25. A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery.

Author: Danis D, Bamshad MJ, Bridges Y, Caballero-Oteyza A, Cacheiro P, Carmody LC, Chimirri L, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Hansen P, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, and Robinson PN
Abstract: The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema., Competing Interests: Declaration of interests M.A.H. is a founder of Alamya Health. M.J.B. and J.X.C. are the Editor-in-Chief and Deputy Editor of HGG Advances, respectively, and were recused from the editorial handling of this manuscript., (Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

26. Learning Weighted Association Rules in Human Phenotype Ontology

Author: Agapito, Giuseppe, Cannataro, Mario, Guzzi, Pietro H., Milano, Marianna, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cazzaniga, Paolo, editor, Besozzi, Daniela, editor, Merelli, Ivan, editor, and Manzoni, Luca, editor
Published: 2020
Full Text: View/download PDF

27. Parallel Learning of Weighted Association Rules in Human Phenotype Ontology

Author: Agapito, Giuseppe, Cannataro, Mario, Guzzi, Pietro Hiram, Milano, Marianna, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Schwardmann, Ulrich, editor, Boehme, Christian, editor, B. Heras, Dora, editor, Cardellini, Valeria, editor, Jeannot, Emmanuel, editor, Salis, Antonio, editor, Schifanella, Claudio, editor, Manumachu, Ravi Reddy, editor, Schwamborn, Dieter, editor, Ricci, Laura, editor, Sangyoon, Oh, editor, Gruber, Thomas, editor, Antonelli, Laura, editor, and Scott, Stephen L., editor
Published: 2020
Full Text: View/download PDF

28. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

Author: Morteza Pourreza Shahri and Indika Kahanda
Subjects: Biomedical relationship extraction, Protein phenotype relationships, Human phenotype ontology, Semi-supervised learning, Ensemble learning, Deep learning, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.
Published: 2021
Full Text: View/download PDF

29. Human phenotype ontology annotation and cluster analysis for pulmonary atresia to unravel clinical outcomes

Author: Bingyan Shu, Huayan Shen, Xinyang Shao, Fengming Luo, Tianjiao Li, and Zhou Zhou
Subjects: pulmonary atresia, Human Phenotype Ontology, unsupervised cluster analysis, Kaplan-Meier curves, Cox proportional hazards regression, Diseases of the circulatory (Cardiovascular) system, RC666-701
Abstract: BackgroundPulmonary atresia (PA) is a heterogeneous congenital heart defect and ventricular septal defect (VSD) is the most vital factor for the conventional classification of PA patients. The simple dichotomy could not fully describe the cardiac morphologies and pathophysiology in such a complex disease. We utilized the Human Phenotype Ontology (HPO) database to explore the phenotypic patterns of PA and the phenotypic influence on prognosis.MethodsWe recruited 786 patients with diagnoses of PA between 2008 and 2016 at Fuwai Hospital. According to cardiovascular phenotypes of patients, we retrieved 52 HPO terms for further analyses. The patients were classified into three clusters based on unsupervised hierarchical clustering. We used Kaplan–Meier curves to estimate survival, the log-rank test to compare survival between clusters, and univariate and multivariate Cox proportional hazards regression modeling to investigate potential risk factors.ResultsAccording to HPO term distribution, we observed significant differences of morphological abnormalities in 3 clusters. We defined cluster 1 as being associated with Tetralogy of Fallot (TOF), VSD, right ventricular hypertrophy (RVH), and aortopulmonary collateral arteries (ACA). ACA was not included in the cluster classification because it was not an HPO term. Cluster 2 was associated with hypoplastic right heart (HRH), atrial septal defect (ASD) and tricuspid disease as the main morphological abnormalities. Cluster 3 presented higher frequency of single ventricle (SV), dextrocardia, and common atrium (CA). The mortality rate in cluster 1 was significantly lower than the rates in cluster 2 and 3 (p = 0.04). Multivariable analysis revealed that abnormal atrioventricular connection (AAC, p = 0.011) and persistent left superior vena cava (LSVC, p = 0.003) were associated with an increased risk of mortality.ConclusionsOur study reported a large cohort with clinical phenotypic, surgical strategy and long time follow-up. In addition, we provided a precise classification and successfully risk stratification for patients with PA.
Published: 2022
Full Text: View/download PDF

30. Advances in big data and omics: Paving the way for discovery in childhood epilepsies.

Author: Magielski, Jan, McSalley, Ian, Parthasarathy, Shridhar, McKee, Jillian, Ganesan, Shiva, and Helbig, Ingo
Abstract: The insights gained from big data and omics approaches have transformed the field of childhood genetic epilepsy. With an increasing number of individuals receiving genetic testing for seizures, we are provided with an opportunity to identify clinically relevant subgroups and extract meaningful observations from this large-scale clinical data. However, the volume of data from electronic medical records and omics (e.g., genomics, transcriptomics) is so vast that standardized methods, such as the Human Phenotype Ontology, are necessary for reliable and comprehensive characterization. Here, we explore the integration of clinical and omics data, highlighting how these approaches pave the way for discovery in childhood epilepsies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Assessing the landscape of STXBP1-related disorders in 534 individuals.

Author: Xian, Julie, Parthasarathy, Shridhar, Ruggiero, Sarah M, Balagura, Ganna, Fitch, Eryn, Helbig, Katherine, Gan, Jing, Ganesan, Shiva, Kaufman, Michael C, Ellis, Colin A, Lewis-Smith, David, Galer, Peter, Cunningham, Kristin, O'Brien, Margaret, Cosico, Mahgenn, Baker, Kate, Darling, Alejandra, Goes, Fernanda Veiga de, Achkar, Christelle M El, and Doering, Jan Henje
Subjects: *RESEARCH, *INFANTILE spasms, *ELECTROENCEPHALOGRAPHY, *EPILEPSY, *RESEARCH methodology, *RETROSPECTIVE studies, *EVALUATION research, *COMPARATIVE studies, *RESEARCH funding, *MEMBRANE proteins, *SEIZURES (Medicine)
Abstract: Disease-causing variants in STXBP1 are among the most common genetic causes of neurodevelopmental disorders. However, the phenotypic spectrum in STXBP1-related disorders is wide and clear correlations between variant type and clinical features have not been observed so far. Here, we harmonized clinical data across 534 individuals with STXBP1-related disorders and analysed 19 973 derived phenotypic terms, including phenotypes of 253 individuals previously unreported in the scientific literature. The overall phenotypic landscape in STXBP1-related disorders is characterized by neurodevelopmental abnormalities in 95% and seizures in 89% of individuals, including focal-onset seizures as the most common seizure type (47%). More than 88% of individuals with STXBP1-related disorders have seizure onset in the first year of life, including neonatal seizure onset in 47%. Individuals with protein-truncating variants and deletions in STXBP1 (n = 261) were almost twice as likely to present with West syndrome and were more phenotypically similar than expected by chance. Five genetic hotspots with recurrent variants were identified in more than 10 individuals, including p.Arg406Cys/His (n = 40), p.Arg292Cys/His/Leu/Pro (n = 30), p.Arg551Cys/Gly/His/Leu (n = 24), p.Pro139Leu (n = 12), and p.Arg190Trp (n = 11). None of the recurrent variants were significantly associated with distinct electroclinical syndromes, single phenotypic features, or showed overall clinical similarity, indicating that the baseline variability in STXBP1-related disorders is too high for discrete phenotypic subgroups to emerge. We then reconstructed the seizure history in 62 individuals with STXBP1-related disorders in detail, retrospectively assigning seizure type and seizure frequency monthly across 4433 time intervals, and retrieved 251 anti-seizure medication prescriptions from the electronic medical records. We demonstrate a dynamic pattern of seizure control and complex interplay with response to specific medications particularly in the first year of life when seizures in STXBP1-related disorders are the most prominent. Adrenocorticotropic hormone and phenobarbital were more likely to initially reduce seizure frequency in infantile spasms and focal seizures compared to other treatment options, while the ketogenic diet was most effective in maintaining seizure freedom. In summary, we demonstrate how the multidimensional spectrum of phenotypic features in STXBP1-related disorders can be assessed using a computational phenotype framework to facilitate the development of future precision-medicine approaches. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. An expanded phenotype centric benchmark of variant prioritisation tools.

Author: Anderson, Denise and Lassmann, Timo
Abstract: Identifying the causal variant for diagnosis of genetic diseases is challenging when using next‐generation sequencing approaches and variant prioritization tools can assist in this task. These tools provide in silico predictions of variant pathogenicity, however they are agnostic to the disease under study. We previously performed a disease‐specific benchmark of 24 such tools to assess how they perform in different disease contexts. We found that the tools themselves show large differences in performance, but more importantly that the best tools for variant prioritization are dependent on the disease phenotypes being considered. Here we expand the assessment to 37 tools and refine our assessment by separating performance for nonsynonymous single nucleotide variants (nsSNVs) and missense variants (i.e., excluding nonsense variants). We found differences in performance for missense variants compared to nsSNVs and recommend three tools that stand out in terms of their performance (BayesDel, CADD, and ClinPred). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Etiologic Classification of Diffuse Parenchymal (Interstitial) Lung Diseases.

Author: Griese, Matthias
Subjects: *LUNG diseases, *INTERSTITIAL lung diseases, *INDIVIDUALIZED medicine, *PULMONARY fibrosis, *NOSOLOGY
Abstract: Interstitial lung diseases (ILD) or diffuse parenchymal lung diseases (DPLD) comprise a large number of disorders. Disease definition and classification allow advanced and personalized judgements on clinical disease, risks for genetic or environmental transmissions, and precision medicine treatments. Registers collect specific rare entities and use ontologies for a precise description of complex phenotypes. Here we present a brief history of ILD classification systems from adult and pediatric pneumology. We center on an etiologic classification, with four main categories: lung-only (native parenchymal) disorders, systemic disease-related disorders, exposure-related disorders, and vascular disorders. Splitting diseases into molecularly defined entities is key for precision medicine and the identification of novel entities. Lumping diseases targeted by similar diagnostic or therapeutic principles is key for clinical practice and register work, as our experience with the European children's ILD register (chILD-EU) demonstrates. The etiologic classification favored combines pediatric and adult lung diseases in a single system and considers genomics and other -omics as central steps towards the solution of "idiopathic" lung diseases. Future tasks focus on a systems' medicine approach integrating all data and bringing precision medicine closer to the patients. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Identifying Clinical Terms in Free-Text Notes Using Ontology-Guided Machine Learning

Author: Arbabi, Aryan, Adams, David R., Fidler, Sanja, Brudno, Michael, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, and Cowen, Lenore J., editor
Published: 2019
Full Text: View/download PDF

35. Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions

Author: Notaro, Marco, Schubach, Max, Frasca, Marco, Mesiti, Marco, Robinson, Peter N., Valentini, Giorgio, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Bartoletti, Massimo, editor, Barla, Annalisa, editor, Bracciali, Andrea, editor, Klau, Gunnar W., editor, Peterson, Leif, editor, Policriti, Alberto, editor, and Tagliaferri, Roberto, editor
Published: 2019
Full Text: View/download PDF

36. Hybrid approach for disease comorbidity and disease gene prediction using heterogeneous dataset.

Author: S., Lakshmi K. and G., Vadivu
Subjects: COMORBIDITY, GENE regulatory networks, HUMAN phenotype, RANDOM walks, PROTEIN-protein interactions
Abstract: High throughput analysis and large scale integration of biological data led to leading researches in the field of bioinformatics. Recent years witnessed the development of various methods for disease associated gene prediction and disease comorbidity predictions. Most of the existing techniques use network-based approaches and similarity-based approaches for these predictions. Even though network-based approaches have better performance, these methods rely on text data from OMIM records and PubMed abstracts. In this method, a novel algorithm (HDCDGP) is proposed for disease comorbidity prediction and disease associated gene prediction. Disease comorbidity network and disease gene network were constructed using data from gene ontology (GO), human phenotype ontology (HPO), protein-protein interaction (PPI) and pathway dataset. Modified random walk restart algorithm was applied on these networks for extracting novel disease- gene associations. Experimental results showed that the hybrid approach has better performance compared to existing systems with an overall accuracy around 85%. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. Significantly different clinical phenotypes associated with mutations in synthesis and transamidase+remodeling glycosylphosphatidylinositol (GPI)-anchor biosynthesis genes

Author: Leigh C. Carmody, Hannah Blau, Daniel Danis, Xingman A. Zhang, Jean-Philippe Gourdine, Nicole Vasilevsky, Peter Krawitz, Miles D. Thompson, and Peter N. Robinson
Subjects: GPI-anchor, Glycosylphosphatidylinositols, Congenital disorders of glycosylation, Human phenotype ontology, Medicine
Abstract: Abstract Background Defects in the glycosylphosphatidylinositol (GPI) biosynthesis pathway can result in a group of congenital disorders of glycosylation known as the inherited GPI deficiencies (IGDs). To date, defects in 22 of the 29 genes in the GPI biosynthesis pathway have been identified in IGDs. The early phase of the biosynthetic pathway assembles the GPI anchor (Synthesis stage) and the late phase transfers the GPI anchor to a nascent peptide in the endoplasmic reticulum (ER) (Transamidase stage), stabilizes the anchor in the ER membrane using fatty acid remodeling and then traffics the GPI-anchored protein to the cell surface (Remodeling stage). Results We addressed the hypothesis that disease-associated variants in either the Synthesis stage or Transamidase+Remodeling-stage GPI pathway genes have distinct phenotypic spectra. We reviewed clinical data from 58 publications describing 152 individual patients and encoded the phenotypic information using the Human Phenotype Ontology (HPO). We showed statistically significant differences between the Synthesis and Transamidase+Remodeling Groups in the frequencies of phenotypes in the musculoskeletal system, cleft palate, nose phenotypes, and cognitive disability. Finally, we hypothesized that phenotypic defects in the IGDs are likely to be at least partially related to defective GPI anchoring of their target proteins. Twenty-two of one hundred forty-two proteins that receive a GPI anchor are associated with one or more Mendelian diseases and 12 show some phenotypic overlap with the IGDs, represented by 34 HPO terms. Interestingly, GPC3 and GPC6, members of the glypican family of heparan sulfate proteoglycans bound to the plasma membrane through a covalent GPI linkage, are associated with 25 of these phenotypic abnormalities. Conclusions IGDs associated with Synthesis and Transamidase+Remodeling stages of the GPI biosynthesis pathway have significantly different phenotypic spectra. GPC2 and GPC6 genes may represent a GPI target of general disruption to the GPI biosynthesis pathway that contributes to the phenotypes of some IGDs.
Published: 2020
Full Text: View/download PDF

38. Deep Phenotypic Analysis for Transposition of the Great Arteries and Prognosis Implication

Author: Huayan Shen, Qiyu He, Xinyang Shao, Shoujun Li, and Zhou Zhou
Subjects: human phenotype ontology, prognosis, risk stratification, surgery, transposition of the great arteries, Diseases of the circulatory (Cardiovascular) system, RC666-701
Abstract: Background Transposition of the great arteries (TGA) consists of about 3% of all congenital heart diseases and 20% of cyanotic congenital heart diseases. It is always accompanied by a series of other cardiac malformations that affect the surgical intervention strategy as well as prognosis. In this study, we comprehensively analyzed the phenotypes of the patients who had TGA with concordant atrioventricular and discordant ventriculoarterial connections and explored their association with prognosis. Methods and Results We retrospectively reviewed 666 patients with a diagnosis of TGA with concordant atrioventricular and discordant ventriculoarterial connections in Fuwai Hospital from 1997 to 2019. Under the guidance of the Human Phenotype Ontology database, patients were classified into 3 clusters. The Kaplan‐Meier method was used to analyze the prognosis, and the Cox proportional regression model was used to investigate the risk factors. In this 666‐patient TGA cohort, the overall 5‐year survival rate was 94.70% (92.95%–96.49%). Three clusters with distinct phenotypes were obtained by the Human Phenotype Ontology database. Kaplan‐Meier analysis revealed a significant difference in freedom from reintervention among 3 clusters (P
Published: 2022
Full Text: View/download PDF

39. Cardiovascular Phenotypes Profiling for L-Transposition of the Great Arteries and Prognosis Analysis

Author: Qiyu He, Huayan Shen, Xinyang Shao, Wen Chen, Yafeng Wu, Rui Liu, Shoujun Li, and Zhou Zhou
Subjects: congenitally corrected transposition of the great arteries, human phenotype ontology, surgery, risk stratification, prognosis, Diseases of the circulatory (Cardiovascular) system, RC666-701
Abstract: ObjectivesCongenitally corrected transposition of the great arteries (ccTGA) is a rare and complex congenital heart disease with the characteristics of double discordance. Enormous co-existed anomalies are the culprit of prognosis evaluation and clinical decision. We aim at delineating a novel ccTGA clustering modality under human phenotype ontology (HPO) instruction and elucidating the relationship between phenotypes and prognosis in patients with ccTGA.MethodsA retrospective review of 270 patients diagnosed with ccTGA in Fuwai hospital from 2009 to 2020 and cross-sectional follow-up were performed. HPO-instructed clustering method was administered in ccTGA risk stratification. Kaplan-Meier survival, Landmark analysis, and cox regression analysis were used to investigate the difference of outcomes among clusters.ResultsThe median follow-up time was 4.29 (2.07–7.37) years. A total of three distinct phenotypic clusters were obtained after HPO-instructed clustering with 21 in cluster 1, 136 in cluster 2, and 113 in cluster 3. Landmark analysis revealed significantly worse mid-term outcomes in all-cause mortality (p = 0.021) and composite endpoints (p = 0.004) of cluster 3 in comparison with cluster 1 and cluster 2. Multivariate analysis indicated that pulmonary arterial hypertension (PAH), atrioventricular septal defect (AVSD), and arrhythmia were risk factors for composite endpoints. Moreover, the surgical treatment was significantly different among the three groups (p < 0.001) and surgical strategies had different effects on the prognosis of the different phenotypic clusters.ConclusionsHuman phenotype ontology-instructed clustering can be a potentially powerful tool for phenotypic risk stratification in patients with complex congenital heart diseases, which may improve prognosis prediction and clinical decision.
Published: 2022
Full Text: View/download PDF

40. HPOAnnotator: improving large-scale prediction of HPO annotations by low-rank approximation with HPO semantic similarities and multiple PPI networks

Author: Junning Gao, Lizhi Liu, Shuwei Yao, Xiaodi Huang, Hiroshi Mamitsuka, and Shanfeng Zhu
Subjects: Low-rank approximation, Human phenotype ontology, Protein-protein interaction networks, Hierarchical structure, Internal medicine, RC31-1245, Genetics, QH426-470
Abstract: Abstract Background As a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory. Method For inferring large-scale protein-phenotype associations, we propose HPOAnnotator that incorporates multiple Protein-Protein Interaction (PPI) information and the hierarchical structure of HPO. Specifically, we use a dual graph to regularize Non-negative Matrix Factorization (NMF) in a way that the information from different sources can be seamlessly integrated. In essence, HPOAnnotator solves the sparsity problem of a protein-phenotype association matrix by using a low-rank approximation. Results By combining the hierarchical structure of HPO and co-annotations of proteins, our model can well capture the HPO semantic similarities. Moreover, graph Laplacian regularizations are imposed in the latent space so as to utilize multiple PPI networks. The performance of HPOAnnotator has been validated under cross-validation and independent test. Experimental results have shown that HPOAnnotator outperforms the competing methods significantly. Conclusions Through extensive comparisons with the state-of-the-art methods, we conclude that the proposed HPOAnnotator is able to achieve the superior performance as a result of using a low-rank approximation with a graph regularization. It is promising in that our approach can be considered as a starting point to study more efficient matrix factorization-based algorithms.
Published: 2019
Full Text: View/download PDF

41. Characterizing Long COVID: Deep Phenotype of a Complex Condition

Author: Rachel R Deer, Madeline A Rock, Nicole Vasilevsky, Leigh Carmody, Halie Rando, Alfred J Anzalone, Marc D Basson, Tellen D Bennett, Timothy Bergquist, Eilis A Boudreau, Carolyn T Bramante, James Brian Byrd, Tiffany J Callahan, Lauren E Chan, Haitao Chu, Christopher G Chute, Ben D Coleman, Hannah E Davis, Joel Gagnier, Casey S Greene, William B Hillegass, Ramakanth Kavuluru, Wesley D Kimble, Farrukh M Koraishy, Sebastian Köhler, Chen Liang, Feifan Liu, Hongfang Liu, Vithal Madhira, Charisse R Madlock-Brown, Nicolas Matentzoglu, Diego R Mazzotti, Julie A McMurry, Douglas S McNair, Richard A Moffitt, Teshamae S Monteith, Ann M Parker, Mallory A Perry, Emily Pfaff, Justin T Reese, Joel Saltz, Robert A Schuff, Anthony E Solomonides, Julian Solway, Heidi Spratt, Gary S Stein, Anupam A Sule, Umit Topaloglu, George D. Vavougios, Liwei Wang, Melissa A Haendel, and Peter N Robinson
Subjects: COVID-19, of post-acute sequelae of SARS-CoV-2, human phenotype ontology, long COVID, phenotyping, Medicine, Medicine (General), R5-920
Abstract: ABSTRACT: Background: Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or “long COVID”), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. Methods: The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. Funding: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. Interpretation: Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. Funding: U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.
Published: 2021
Full Text: View/download PDF

42. Early-Onset Dementia Associated with a Heterozygous, Nonsense, and de novo Variant in the MBD5 Gene.

Author: González-Ortega, Guillermo, Llamas-Velasco, Sara, Arteche-López, Ana, Quesada-Espinosa, Juan Francisco, Puertas-Martín, Verónica, Gómez-Grande, Adolfo, López-Álvarez, Jorge, Saiz Díaz, Rosa Ana, Lezana-Rosales, José Miguel, Villarejo-Galende, Alberto, and González de la Aleja, Jesús
Subjects: *GENETIC variation, *NEUROBEHAVIORAL disorders, *DEMENTIA, *PERSONALITY disorders, *INTELLECTUAL disabilities, *PROTEIN domains, *EPILEPSY, *GENETIC mutation, *GENETIC carriers, *NEUROPSYCHOLOGICAL tests, *DNA-binding proteins, *PHENOTYPES, *DISEASE complications
Abstract: The haploinsufficiency of the methyl-binding domain protein 5 (MBD5) gene has been identified as the determinant cause of the neuropsychiatric disorders grouped under the name MBD5-neurodevelopment disorders (MAND). MAND includes patients with intellectual disability, behavioral problems, and seizures with a static clinical course. However, a few reports have suggested regression. We describe a non-intellectually disabled female, with previous epilepsy and personality disorder, who developed early-onset dementia. The extensive etiologic study revealed a heterozygous nonsense de novo pathogenic variant in the MBD5 gene. This finding could support including the MBD5 gene in the study of patients with atypical early-onset dementia. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

43. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes.

Author: Pourreza Shahri, Morteza and Kahanda, Indika
Subjects: *HUMAN phenotype, *DEEP learning, *SUPERVISED learning, *NATURAL language processing, *SUPPLY & demand, *LEARNING
Abstract: Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

44. SARAEasy: A Mobile App for Cerebellar Syndrome Quantification and Characterization

Author: Maarouf, Haitham, López, Vanessa, Sobrido, Maria J., Martínez, Diego, Taboada, Maria, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Rojas, Ignacio, editor, and Ortuño, Francisco, editor
Published: 2018
Full Text: View/download PDF

45. A data-driven architecture using natural language processing to improve phenotyping efficiency and accelerate genetic diagnoses of rare disorders

Author: Jignesh R. Parikh, Casie A. Genetti, Asli Aykanat, Catherine A. Brownstein, Klaus Schmitz-Abe, Morgan Danowski, Andrew Quitadomo, Jill A. Madden, Calum Yacoubian, Richard Gain, Tessa Williams, Mary Meskell, Andrew Brown, Alison Frith, Shira Rockowitz, Piotr Sliz, Pankaj B. Agrawal, Thomas Defay, Paul McDonagh, John Reynders, Sebastien Lefebvre, and Alan H. Beggs
Subjects: natural language processing, genetics, Human Phenotype Ontology, electronic health records, Genetics, QH426-470
Abstract: Summary: Effective genetic diagnosis requires the correlation of genetic variant data with detailed phenotypic information. However, manual encoding of clinical data into machine-readable forms is laborious and subject to observer bias. Natural language processing (NLP) of electronic health records has great potential to enhance reproducibility at scale but suffers from idiosyncrasies in physician notes and other medical records. We developed methods to optimize NLP outputs for automated diagnosis. We filtered NLP-extracted Human Phenotype Ontology (HPO) terms to more closely resemble manually extracted terms and identified filter parameters across a three-dimensional space for optimal gene prioritization. We then developed a tiered pipeline that reduces manual effort by prioritizing smaller subsets of genes to consider for genetic diagnosis. Our filtering pipeline enabled NLP-based extraction of HPO terms to serve as a sufficient replacement for manual extraction in 92% of prospectively evaluated cases. In 75% of cases, the correct causal gene was ranked higher with our applied filters than without any filters. We describe a framework that can maximize the utility of NLP-based phenotype extraction for gene prioritization and diagnosis. The framework is implemented within a cloud-based modular architecture that can be deployed across health and research institutions.
Published: 2021
Full Text: View/download PDF

46. Clinical, neuroimaging, and molecular spectrum of TECPR2‐associated hereditary sensory and autonomic neuropathy with intellectual disability.

Author: Neuser, Sonja, Brechmann, Barbara, Heimer, Gali, Brösse, Ines, Schubert, Susanna, O'Grady, Lauren, Zech, Michael, Srivastava, Siddharth, Sweetser, David A., Dincer, Yasemin, Mall, Volker, Winkelmann, Juliane, Behrends, Christian, Darras, Basil T., Graham, Robert J., Jayakar, Parul, Byrne, Barry, Bar‐Aluma, Bat El, Haberman, Yael, and Szeinberg, Amir
Abstract: Bi‐allelic TECPR2 variants have been associated with a complex syndrome with features of both a neurodevelopmental and neurodegenerative disorder. Here, we provide a comprehensive clinical description and variant interpretation framework for this genetic locus. Through international collaboration, we identified 17 individuals from 15 families with bi‐allelic TECPR2‐variants. We systemically reviewed clinical and molecular data from this cohort and 11 cases previously reported. Phenotypes were standardized using Human Phenotype Ontology terms. A cross‐sectional analysis revealed global developmental delay/intellectual disability, muscular hypotonia, ataxia, hyporeflexia, respiratory infections, and central/nocturnal hypopnea as core manifestations. A review of brain magnetic resonance imaging scans demonstrated a thin corpus callosum in 52%. We evaluated 17 distinct variants. Missense variants in TECPR2 are predominantly located in the N‐ and C‐terminal regions containing β‐propeller repeats. Despite constituting nearly half of disease‐associated TECPR2 variants, classifying missense variants as (likely) pathogenic according to ACMG criteria remains challenging. We estimate a pathogenic variant carrier frequency of 1/1221 in the general and 1/155 in the Jewish Ashkenazi populations. Based on clinical, neuroimaging, and genetic data, we provide recommendations for variant reporting, clinical assessment, and surveillance/treatment of individuals with TECPR2‐associated disorder. This sets the stage for future prospective natural history studies. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

47. Multilayer concept of autoimmune mechanisms and manifestations in inborn errors of immunity: Relevance for precision therapy.

Author: Seidel, Markus G. and Hauck, Fabian
Abstract: Autoimmunity in inborn errors of immunity (IEIs) has a multifactorial pathogenesis and develops subsequent to a genetic predisposition in conjunction with gene regulation, environmental modifiers, and infectious triggers. On the basis of incremental data availability owing to upfront application of omics technologies, a more granular and dynamic view of mechanisms and manifestations is warranted. Here, we present a comprehensive novel concept of autoimmunity in IEIs that considers multiple layers of interdependent elements and connects 101 causative genes or deletions according to the quality of the allelic variants with 47 molecular pathways and 22 immune effector mechanisms. Furthermore, we list 50 resulting manifestations together with the corresponding Human Phenotype Ontology terms and review the types and frequencies of the most relevant clinical presentations. When all of its elements are taken together, this concept (1) extends the historical anatomic view of central versus peripheral tolerance toward multiple interdependent mechanisms of immune tolerance, (2) delineates the mechanisms underlying the protean clinical manifestations, and thereby, (3) points toward the most suitable precision therapy for autoimmunity in IEIs. The multilayer concept of autoimmune mechanisms and manifestations in IEIs will facilitate research design and provide clinical guidance on the use of precision medicine irrespective of the data depth available in each health care scenario. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. An ontological foundation for ocular phenotypes and rare eye diseases

Author: Panagiotis I. Sergouniotis, Emmanuel Maxime, Dorothée Leroux, Annie Olry, Rachel Thompson, Ana Rath, Peter N. Robinson, Hélène Dollfus, and for the ERN-EYE Ontology Study Group
Subjects: Evidence-based precision medicine, Rare eye disease, Human phenotype ontology, Orphanet rare disease ontology, Medicine
Abstract: Abstract Background The optical accessibility of the eye and technological advances in ophthalmic diagnostics have put ophthalmology at the forefront of data-driven medicine. The focus of this study is rare eye disorders, a group of conditions whose clinical heterogeneity and geographic dispersion make data-driven, evidence-based practice particularly challenging. Inter-institutional collaboration and information sharing is crucial but the lack of standardised terminology poses an important barrier. Ontologies are computational tools that include sets of vocabulary terms arranged in hierarchical structures. They can be used to provide robust terminology standards and to enhance data interoperability. Here, we discuss the development of the ophthalmology-related component of two well-established biomedical ontologies, the Human Phenotype Ontology (HPO; includes signs, symptoms and investigation findings) and the Orphanet Rare Disease Ontology (ORDO; includes rare disease nomenclature/nosology). Methods A variety of approaches were used including automated matching to existing resources and extensive manual curation. To achieve the latter, a study group including clinicians, patient representatives and ontology developers from 17 countries was formed. A broad range of terms was discussed and validated during a dedicated workshop attended by 60 members of the group. Results A comprehensive, structured and well-defined set of terms has been agreed on including 1106 terms relating to ocular phenotypes (HPO) and 1202 terms relating to rare eye disease nomenclature (ORDO). These terms and their relevant annotations can be accessed in http://www.human-phenotype-ontology.org/ and http://www.orpha.net/; comments, corrections, suggestions and requests for new terms can be made through these websites. This is an ongoing, community-driven endeavour and both HPO and ORDO are regularly updated. Conclusions To our knowledge, this is the first effort of such scale to provide terminology standards for the rare eye disease community. We hope that this work will not only improve coding and standardise information exchange in clinical care and research, but also it will catalyse the transition to an evidence-based precision ophthalmology paradigm.
Published: 2019
Full Text: View/download PDF

49. The clinical and genetic spectrum of paediatric speech and language disorders in 52,143 individuals.

Author: Magielski J, Ruggiero SM, Xian J, Parthasarathy S, Galer P, Ganesan S, Back A, McKee J, McSalley I, Gonzalez AK, Morgan A, Donaher J, and Helbig I
Abstract: Speech and language disorders are known to have a substantial genetic contribution. Although frequently examined as components of other conditions, research on the genetic basis of linguistic differences as separate phenotypic subgroups has been limited so far. Here, we performed an in-depth characterization of speech and language disorders in 52,143 individuals, reconstructing clinical histories using a large-scale data mining approach of the Electronic Medical Records (EMR) from an entire large paediatric healthcare network. The reported frequency of these disorders was the highest between 2 and 5 years old and spanned a spectrum of twenty-six broad speech and language diagnoses. We used Natural Language Processing to assess to which degree clinical diagnosis in full-text notes were reflected in ICD-10 diagnosis codes. We found that aphasia and speech apraxia could be easily retrieved through ICD-10 diagnosis codes, while stuttering as a speech phenotype was only coded in 12% of individuals through appropriate ICD-10 codes. We found significant comorbidity of speech and language disorders in neurodevelopmental conditions (30.31%) and to a lesser degree with epilepsies (6.07%) and movement disorders (2.05%). The most common genetic disorders retrievable in our EMR analysis were STXBP1 ( n =21), PTEN ( n =20), and CACNA1A ( n =18). When assessing associations of genetic diagnoses with specific linguistic phenotypes, we observed associations of STXBP1 and aphasia ( P =8.57 × 10 -7 , CI=18.62-130.39) and MYO7A with speech and language development delay due to hearing loss ( P =1.24 × 10 -5 , CI=17.46-Inf). Finally, in a sub-cohort of 726 individuals with whole exome sequencing data, we identified an enrichment of rare variants in synaptic protein and neuronal receptor pathways and associations of UQCRC1 with expressive aphasia and WASHC4 with abnormality of speech or vocalization. In summary, our study outlines the landscape of paediatric speech and language disorders, confirming the phenotypic complexity of linguistic traits and novel genotype-phenotype associations. Subgroups of paediatric speech and language disorders differ significantly with respect to the composition of monogenic aetiologies., Competing Interests: Competing interests The authors report no competing interests.
Published: 2024
Full Text: View/download PDF

50. GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts.

Author: Wu D, Yang J, Liu C, Hsieh TC, Marchi E, Blair J, Krawitz P, Weng C, Chung W, Lyon GJ, Krantz ID, Kalish JM, and Wang K
Abstract: Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in helping the phenotype-driven reinterpretation of genome/exome sequencing data. Existing methods using frontal facial photos were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates facial images, demographic information (age, sex, ethnicity), and clinical notes (optionally, a list of Human Phenotype Ontology terms) to improve prediction accuracy. Furthermore, we also evaluated GestaltMML on a diverse range of datasets, including 528 diseases from the GestaltMatcher Database, several in-house datasets of Beckwith-Wiedemann syndrome (BWS, over-growth syndrome with distinct facial features), Sotos syndrome (overgrowth syndrome with overlapping features with BWS), NAA10-related neurodevelopmental syndrome, Cornelia de Lange syndrome (multiple malformation syndrome), and KBG syndrome (multiple malformation syndrome). Our results suggest that GestaltMML effectively incorporates multiple modalities of data, greatly narrowing candidate genetic diagnoses of rare diseases and may facilitate the reinterpretation of genome/exome sequencing data., Competing Interests: Competing interests The authors declare no competing interests.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

598 results on '"Human Phenotype Ontology"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources