838 results on '"Ensembl"'
Search Results
2. Genome sequencing and analysis of black flounder (Paralichthys orbignyanus) reveals new insights into Pleuronectiformes genomic size and structure
- Author
-
Fernando Villarreal, Germán F. Burguener, Ezequiel J. Sosa, Nicolas Stocchi, Gustavo M. Somoza, Adrián G. Turjanski, Andrés Blanco, Jordi Viñas, and Alejandro S. Mechaly
- Subjects
Pleuronectiformes ,Paralichthys orbignyanus ,Black flounder ,Genome ,Intron size ,Ensembl ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Black flounder (Paralichthys orbignyanus, Pleuronectiformes) is a commercially significant marine fish with promising aquaculture potential in Argentina. Despite extensive studies on Black flounder aquaculture, its limited genetic information available hampers the crucial role genetics plays in the development of this activity. In this study, we first employed Illumina sequencing technology to sequence the entire genome of Black flounder. Utilizing two independent libraries—one from a female and another from a male—with 150 bp paired-end reads, a mean insert length of 350 bp, and over 35 X-fold coverage, we achieved assemblies resulting in a genome size of ~ 538 Mbp. Analysis of the assemblies revealed that more than 98% of the core genes were present, with more than 78% of them having more than 50% coverage. This indicates a somehow complete and accurate genome at the coding sequence level. This genome contains 25,231 protein-coding genes, 445 tRNAs, 3 rRNAs, and more than 1,500 non-coding RNAs of other types. Black flounder, along with pufferfishes, seahorses, pipefishes, and anabantid fish, displays a smaller genome compared to most other teleost groups. In vertebrates, the number of transposable elements (TEs) is often correlated with genome size. However, it remains unclear whether the sizes of introns and exons also play a role in determining genome size. Hence, to elucidate the potential factors contributing to this reduced genome size, we conducted a comparative genomic analysis between Black flounder and other teleost orders to determine if the small genomic size could be explained by repetitive elements or gene features, including the whole genome genes and introns sizes. We show that the smaller genome size of flounders can be attributed to several factors, including changes in the number of repetitive elements, and decreased gene size, particularly due to lower amount of very large and small introns. Thus, these components appear to be involved in the genome reduction in Black flounder. Despite these insights, the full implications and potential benefits of genome reduction in Black flounder for reproduction and aquaculture remain incompletely understood, necessitating further research.
- Published
- 2024
- Full Text
- View/download PDF
3. Genome sequencing and analysis of black flounder (Paralichthys orbignyanus) reveals new insights into Pleuronectiformes genomic size and structure.
- Author
-
Villarreal, Fernando, Burguener, Germán F., Sosa, Ezequiel J., Stocchi, Nicolas, Somoza, Gustavo M., Turjanski, Adrián G., Blanco, Andrés, Viñas, Jordi, and Mechaly, Alejandro S.
- Subjects
- *
PARALICHTHYS , *MARINE fishes , *FLATFISHES , *WHOLE genome sequencing , *FISH farming , *NUCLEOTIDE sequencing , *TRANSFER RNA , *GENOMES - Abstract
Black flounder (Paralichthys orbignyanus, Pleuronectiformes) is a commercially significant marine fish with promising aquaculture potential in Argentina. Despite extensive studies on Black flounder aquaculture, its limited genetic information available hampers the crucial role genetics plays in the development of this activity. In this study, we first employed Illumina sequencing technology to sequence the entire genome of Black flounder. Utilizing two independent libraries—one from a female and another from a male—with 150 bp paired-end reads, a mean insert length of 350 bp, and over 35 X-fold coverage, we achieved assemblies resulting in a genome size of ~ 538 Mbp. Analysis of the assemblies revealed that more than 98% of the core genes were present, with more than 78% of them having more than 50% coverage. This indicates a somehow complete and accurate genome at the coding sequence level. This genome contains 25,231 protein-coding genes, 445 tRNAs, 3 rRNAs, and more than 1,500 non-coding RNAs of other types. Black flounder, along with pufferfishes, seahorses, pipefishes, and anabantid fish, displays a smaller genome compared to most other teleost groups. In vertebrates, the number of transposable elements (TEs) is often correlated with genome size. However, it remains unclear whether the sizes of introns and exons also play a role in determining genome size. Hence, to elucidate the potential factors contributing to this reduced genome size, we conducted a comparative genomic analysis between Black flounder and other teleost orders to determine if the small genomic size could be explained by repetitive elements or gene features, including the whole genome genes and introns sizes. We show that the smaller genome size of flounders can be attributed to several factors, including changes in the number of repetitive elements, and decreased gene size, particularly due to lower amount of very large and small introns. Thus, these components appear to be involved in the genome reduction in Black flounder. Despite these insights, the full implications and potential benefits of genome reduction in Black flounder for reproduction and aquaculture remain incompletely understood, necessitating further research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Distribution of human gene polymorphisms allele frequencies associated with viral infections
- Author
-
Natalia V. Vlasenko, Mikhail D. Chanyshev, Dmitriy V. Dubodelov, Artem A. Serkov, Galina G. Solopova, Anastasija V. Sacuk, Artem V. Snicar, Tatiana А. Semenenko, Stanislav N. Kuzin, and Vasily G. Akimkin
- Subjects
genetic polymorphism ,snp ,allele frequency ,ensembl ,information systems ,sampling ,Microbiology ,QR1-502 - Abstract
Introduction. The design of studies aimed at finding the association between the genetic factor and the studied feature (disease) involves a comparison of the ratio of genotypes or allelic proportions in the study group with those in the control group. At the stage of determining the ratio of genotypes of the studied polymorphisms in the reference group, researchers meet a number of problems, which are the subject of the present work. Aim of the work is to provide scientific rationale for the feasibility of creating a national information system comprising genetic data of the relatively healthy population of Russia, incorporating its ethnic diversity. Materials and methods. The study group, total 1020 people, was genotyped for a number of single nucleotide polymorphisms of human genes. A comparative characteristic of the frequency distribution of the studied polymorphisms with those presented in international databases as reference data was carried out using χ2 index. Results. The frequency of SNP rs4986790 of the TLR4 gene significantly differs from the EUR population (p = 0.032) and the CEU subpopulation (p = 0.047). The allele frequencies of the rs1800795 (IL6) and rs1800896 (IL10) polymorphisms in the study population differ from the CEU subgroup (p = 0.030 and 0.012, respectively). The frequency of SNP rs2295119 (HLA-DPA2) in the study group is significantly different from the EUR population (p = 0.034). Conclusion. The analysis carried out in this work confirms the need to create a domestic information system containing data on the occurrence of SNP alleles and genotypes for a conditionally healthy population and in subgroups with various pathological conditions.
- Published
- 2023
- Full Text
- View/download PDF
5. Lost in translation: the pitfalls of Ensembl gene annotations between human genome assemblies and their impact on diagnostics.
- Author
-
Abdallah, Mohammed O. E., Koko, Mahmoud, and Ramesar, Raj
- Abstract
Gene models based on GRCh37 human genome assembly are preferred by many international projects over other updated assemblies (GRCh38 and T2T). Discrepant genes (DGs), those recognized as protein coding in the new but not the old assembly, are ignored by several genomic resources and discarded by variant prioritization tools relying on information based on GRCh37. We curated a set of Ensembl genes with discrepant annotations between GRCh37 and GRCh38, additionally matching their RefSeq transcripts. Furthermore, we examined their clinical and phenotypic relevance. A total of 337 genes were reclassified as 'protein-coding' in GRCh38 but not in GRCh37, with 194 having a discrepant HGNC gene symbol. Many remain missing from the current known RefSeq gene models (N = 73). We found many clinically relevant genes in this group of neglected genes, and we anticipate that many more will be found relevant in the future. Important additional annotations such as evolutionary constraint metrics are also not calculated for these genes, further relegating them into oblivion. For discrepant genes, the inaccurate label of 'non-protein-coding' has relevant ramifications on clinical genetics. Accurate collation of these genes allows for manual curation in clinically relevant scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. In Silico Analysis of BMAL1 and CLOCK SNPs in the Ensembl Database.
- Author
-
Gul, Seref
- Subjects
- *
CIRCADIAN rhythms , *BLOOD pressure , *METABOLISM , *AFFECTIVE disorders , *PATHOGENIC bacteria - Abstract
Objective: A circadian rhythm in mammals controls the sleep-wake cycle, blood pressure, hormone secretion, metabolism and many other physiological processes. The circadian clock mechanism is regulated by four genes: Bmal1, Clock, Cry, and Per. Mutations in these regulatory genes are associated with sleep and mood disorders, obesity, and cancer. Several PER2 and CRY2 SNPs are associated with advanced sleep phase syndrome. It is, therefore, critical to understand the effect of clock genes' SNPs on the circadian clock. In this study, we determined "pathogenic" BMAL1 and CLOCK SNPs in the Ensembl database for biochemical characterization. Materials and Methods: BMAL1 and CLOCK SNPs in the Ensemble database were filtered out for only missense mutations. Among the missense mutations, pathogenic ones were determined according to SIFT, PolyPhen, and CADD scores, REVEL, MetalR, Mutation Assessor, I-Mutant, PROVEAN, and FireDock programs. BMAL1 and CLOCK SNPs were visualized by using PyMol. Results: Thousands of BMAL1 and CLOCK missense SNP mutations were reported in the Ensembl database. After the classification of those SNPs according to their SIFT, PolyPhen, and CADD pathogenicity, twelve SNPs for each protein remained as pathogenic. A further analysis with all in silico tools revealed that BMAL1 SNPs causing Ala154Val, Arg166Gln, and Val440Gly mutations; and CLOCK SNPs causing Gly120Val, Asp119Val, Gly120Ser, Ala117Val, and Cys371Gly mutations were predicted as the most "pathogenic" ones. Conclusion: Overall, by using in silico tools, we provided a starting point for experimental studies for determining the effect of pathogenic BMAL1 and CLOCK SNPs on the circadian clock mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Evolutionary perspective of Big tau structure: 4a exon variants of MAPT.
- Author
-
Fischer, Itzhak
- Subjects
TAU proteins ,ALTERNATIVE RNA splicing ,PERIPHERAL nervous system ,ALZHEIMER'S disease ,MICROTUBULE-associated proteins - Abstract
The MAPT gene encoding the microtubule-associated protein tau can generate multiple isoforms by alternative splicing giving rise to proteins which are differentially expressed in specific areas of the nervous system and at different developmental stages. Tau plays important roles in modulating microtubule dynamics, axonal transport, synaptic plasticity, and DNA repair, and has also been associated with neurodegenerative diseases (tauopathies) including Alzheimer’s disease and frontotemporal dementia. A unique high-molecular-weight isoform of tau, originally found to be expressed in the peripheral nervous system and projecting neurons, has been termed Big tau and has been shown to uniquely contain the large exon 4a that significantly increases the size and 3D structure of tau. With little progress since the original discovery of Big tau, more than 25 years ago, we have now completed a comprehensive comparative study to analyze the structure of the MAPT gene against available databases with respect to the composition of the tau exons as they evolved from early vertebrates to primates and human. We focused the analysis on the evolution of the 4a exon variants and their homology relative to humans. We discovered that the 4a exon defining Big tau appears to be present early in vertebrate evolution as a large insert that dramatically changed the size of the tau protein with low sequence conservation despite a stable size range of about 250aa, and in some species a larger 4a-L exon of 355aa. We suggest that 4a exon variants evolved independently in different species by an exonization process using new alternative splicing to address the growing complexities of the evolving nervous systems. Thus, the appearance of a significantly larger isoform of tau independently repeated itself multiple times during evolution, accentuating the need across vertebrate species for an elongated domain that likely endows Big tau with novel physiological functions as well as properties related to neurodegeneration. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Accessing Livestock Resources in Ensembl
- Author
-
Fergal J. Martin, Astrid Gall, Michal Szpak, and Paul Flicek
- Subjects
Ensembl ,genome browser ,annotation ,tutorial ,livestock ,farmed animals ,Genetics ,QH426-470 - Abstract
Genome assembly is cheaper, more accurate and more automated than it has ever been. This is due to a combination of more cost-efficient chemistries, new sequencing technologies and better algorithms. The livestock community has been at the forefront of this new wave of genome assembly, generating some of the highest quality vertebrate genome sequences. Ensembl’s goal is to add functional and comparative annotation to these genomes, through our gene annotation, genomic alignments, gene trees, regulatory, and variation data. We run computationally complex analyses in a high throughput and consistent manner to help accelerate downstream science. Our livestock resources are continuously growing in both breadth and depth. We annotate reference genome assemblies for newly sequenced species and regularly update annotation for existing genomes. We are the only major resource to support the annotation of breeds and other non-reference assemblies. We currently provide resources for 13 pig breeds, maternal and paternal haplotypes for hybrid cattle and various other non-reference or wild type assemblies for livestock species. Here, we describe the livestock data present in Ensembl and provide protocols for how to view data in our genome browser, download via it our FTP site, manipulate it via our tools and interact with it programmatically via our REST API.
- Published
- 2021
- Full Text
- View/download PDF
9. Accessing Livestock Resources in Ensembl.
- Author
-
Martin, Fergal J., Gall, Astrid, Szpak, Michal, and Flicek, Paul
- Subjects
LIVESTOCK ,HAPLOTYPES ,CATTLE genetics ,GENOMES - Abstract
Genome assembly is cheaper, more accurate and more automated than it has ever been. This is due to a combination of more cost-efficient chemistries, new sequencing technologies and better algorithms. The livestock community has been at the forefront of this new wave of genome assembly, generating some of the highest quality vertebrate genome sequences. Ensembl's goal is to add functional and comparative annotation to these genomes, through our gene annotation, genomic alignments, gene trees, regulatory, and variation data. We run computationally complex analyses in a high throughput and consistent manner to help accelerate downstream science. Our livestock resources are continuously growing in both breadth and depth. We annotate reference genome assemblies for newly sequenced species and regularly update annotation for existing genomes. We are the only major resource to support the annotation of breeds and other non-reference assemblies. We currently provide resources for 13 pig breeds, maternal and paternal haplotypes for hybrid cattle and various other non-reference or wild type assemblies for livestock species. Here, we describe the livestock data present in Ensembl and provide protocols for how to view data in our genome browser, download via it our FTP site, manipulate it via our tools and interact with it programmatically via our REST API. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Analysing Point Mutations in Protein Cleavage Sites by Using Enzyme Specificity Matrices
- Author
-
Jakob Triebel, Sandeep Silawal, Maximilian Willauschus, Gundula Schulze-Tanzil, and Thomas Bertsch
- Subjects
MEROPS ,Ensembl ,UniProt ,MEGA ,cleavage sites ,enzymes ,Diseases of the endocrine glands. Clinical endocrinology ,RC648-665 - Published
- 2019
- Full Text
- View/download PDF
11. The genome sequence of the peach blossom moth, Thyatira batis (Linnaeus, 1758)
- Author
-
Douglas Boyes, Peter W. H. Holland, Wytham Woods Genome Acquisition Lab, Lab, University of Oxford and Wytham Woods Genome Acquisition, collective, Darwin Tree of Life Barcoding, programme, Wellcome Sanger Institute Tree of Life, collective, Wellcome Sanger Institute Scientific Operations: DNA Pipelines, collective, Tree of Life Core Informatics, and Consortium, Darwin Tree of Life
- Subjects
Genetics ,Whole genome sequencing ,Mitochondrial DNA ,biology ,Medicine (miscellaneous) ,Chromosome ,Sequence assembly ,Gene Annotation ,biology.organism_classification ,General Biochemistry, Genetics and Molecular Biology ,Biology and Microbiology ,Ensembl ,Drepanidae ,Gene - Abstract
We present a genome assembly from an individual male Thyatira batis (the peach-blossom moth; Arthropoda; Insecta; Lepidoptera; Drepanidae). The genome sequence is 315 megabases in span. The majority of the assembly (99.68%) is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. The mitochondrial genome was also assembled and is 15.4 kilobases in length. Gene annotation of this assembly on Ensembl has identified 12,238 protein coding genes.
- Published
- 2023
12. Bridging the gap by discerning SNPs in linkage disequilibrium and their role in breast cancer.
- Author
-
Maqbool, Sundus Naila, Nazeer, Haleema Saadiya, Rafiq, Mehak, Javed, Aneela, and Hanif, Rumeza
- Subjects
- *
SINGLE nucleotide polymorphisms , *GENETICS of breast cancer , *ONCOGENES , *TRANSCRIPTION factors , *DISEASE susceptibility - Abstract
Abstract Breast Cancer is the most common cancer among women with several genes involved in disease susceptibility. As majority of genome-wide significant variants fall outside the coding region, it is likely that some of them alter specific gene functions. GWAS database was used to interpret the regulatory functions of these genetic variants. A total of 320 SNPs for breast cancer were selected via GWAS, which were entered into the SNAP web portal tool, to determine the one's found to be in Linkage Disequilibrium (r2 < 0.80). The resulting 2024 proxy SNP's were processed in RegulomeDB to predict their regulatory role. Of these, 1440 produced a score ranging from 1–6, whereas the remaining produced no data. Only the variants under score 4 (cut-off value) in RegulomeDB has been studied further. From these variants, 221 had scores of less than 4, indicating a high degree of potential regulatory role associated with them. Further study revealed that 61 of the 221 SNPs were reported to be genome-wide significant for breast cancer, 52 to be associated with other diseases, 99 as unconfirmed for association with breast cancer, leaving only 9 to be novel proxy SNPs linked to breast cancer. Therefore, the study further confirmed postulation of non-coding variants being linked to disease risk thereby, requiring additional validation through genome-wide association studies to substantiate their underlying mechanism. Highlights • Variants lying outside coding region may alter gene function and play role in disease pathogenesis. • Nine novel SNPs have been associated with Breast cancer. • Novel SNPs regulate expression of oncogenes either directly or by interfering with binding of transcription factors. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
13. ENSEMBL
- Author
-
Arnemann, J., Gressner, Axel M., editor, and Arndt, Torsten, editor
- Published
- 2019
- Full Text
- View/download PDF
14. Genomicus in 2022: comparative tools for thousands of genomes and reconstructed ancestors
- Author
-
Jean-François Dufayard, Pierre Vincens, Hugues Roest Crollius, Nga Nguyen, Alexandra Louis, Institut de biologie de l'ENS Paris (IBENS), Département de Biologie - ENS Paris, École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Modélisation mathématique et statistique en biologie et médecine, Université Paris Diderot - Paris 7 (UPD7)-Institut National de la Santé et de la Recherche Médicale (INSERM), Amélioration génétique et adaptation des plantes méditerranéennes et tropicales (UMR AGAP), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut Agro Montpellier, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Montpellier (UM), Département Systèmes Biologiques (Cirad-BIOS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), DYnamique et Organisation des GENomes - Equipe de l'IBENS (DYOGEN), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Département de Biologie - ENS Paris, ANR-10-LABX-0054,MEMOLIFE,Memory in living systems: an integrated approach(2010), ANR-10-IDEX-0001,PSL,Paris Sciences et Lettres(2010), ANR-11-INBS-0013,IFB (ex Renabi-IFB),Institut français de bioinformatique(2011), Louis, Alexandra, Memory in living systems: an integrated approach - - MEMOLIFE2010 - ANR-10-LABX-0054 - LABX - VALID, Initiative d'excellence - Paris Sciences et Lettres - - PSL2010 - ANR-10-IDEX-0001 - IDEX - VALID, Infrastructures - Institut français de bioinformatique - - IFB (ex Renabi-IFB)2011 - ANR-11-INBS-0013 - INBS - VALID, Institut de biologie de l'ENS Paris (UMR 8197/1024) (IBENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris), Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Département de Biologie - ENS Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Département de Biologie - ENS Paris, Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Département de Biologie - ENS Paris, Institut Français de Bioinformatique (IFB-CORE), and Institut National de Recherche en Informatique et en Automatique (Inria)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)
- Subjects
AcademicSubjects/SCI00010 ,[SDV]Life Sciences [q-bio] ,Computational biology ,Biology ,Synteny ,Genome ,Evolution, Molecular ,03 medical and health sciences ,0302 clinical medicine ,Type (biology) ,Extant taxon ,Ensembl Genomes ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Databases, Genetic ,Genetics ,Database Issue ,Humans ,Ensembl ,Taxonomic rank ,Phylogeny ,030304 developmental biology ,Comparative genomics ,Internet ,0303 health sciences ,[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Eukaryota ,Genomics ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Software ,030217 neurology & neurosurgery - Abstract
Genomicus is a database and web-server dedicated to comparative genomics in eukaryotes. Its main functionality is to graphically represent the conservation of genomic blocks between multiple genomes, locally around a specific gene of interest or genome-wide through karyotype comparisons. Since 2010 and its first release, Genomicus has synchronized with 60 Ensembl releases and seen the addition of functions that have expanded the type of analyses that users can perform. Today, five public instances of Genomicus are supporting a total number of 1029 extant genomes and 621 ancestral reconstructions from all eukaryotes kingdoms available in Ensembl and Ensembl Genomes databases complemented with four additional instances specific to taxonomic groups of interest. New visualization and query tools are described in this manuscript. Genomicus is freely available at http://www.genomicus.bio.ens.psl.eu/genomicus.
- Published
- 2021
- Full Text
- View/download PDF
15. The European Bioinformatics Institute (EMBL-EBI) in 2021
- Author
-
Gaia Cantelli, Rahuman S Malik-Sheriff, Paul Flicek, Ewan Birney, Cath Brooksbank, Alex Bateman, Johanna McEntyre, Anton I. Petrov, Henning Hermjakob, Michele Ide-Smith, and Rolf Apweiler
- Subjects
InterPro ,Service (systems architecture) ,RNA, Untranslated ,Databases, Factual ,AcademicSubjects/SCI00010 ,Databases, Pharmaceutical ,Information Storage and Retrieval ,Biology ,Bioinformatics ,Data type ,Artificial Intelligence ,Component (UML) ,Genetics ,Ensembl ,Database Issue ,Humans ,Databases, Protein ,Balanced scorecard ,Genome, Human ,SARS-CoV-2 ,Academies and Institutes ,COVID-19 ,Computational Biology ,Metadata ,Europe ,UniProt - Abstract
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI’s core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI’s online training offering.
- Published
- 2021
16. Distribution of human gene polymorphisms allele frequencies associated with viral infections.
- Author
-
Vlasenko NV, Chanyshev MD, Dubodelov DV, Serkov AA, Solopova GG, Sacuk AV, Snicar AV, Semenenko TА, Kuzin SN, and Akimkin VG
- Subjects
- Humans, Gene Frequency, Genotype, Alleles, Case-Control Studies, Polymorphism, Single Nucleotide, Virus Diseases genetics
- Abstract
Introduction: The design of studies aimed at finding the association between the genetic factor and the studied feature (disease) involves a comparison of the ratio of genotypes or allelic proportions in the study group with those in the control group. At the stage of determining the ratio of genotypes of the studied polymorphisms in the reference group, researchers meet a number of problems, which are the subject of the present work. Aim of the work is to provide scientific rationale for the feasibility of creating a national information system comprising genetic data of the relatively healthy population of Russia, incorporating its ethnic diversity., Materials and Methods: The study group, total 1020 people, was genotyped for a number of single nucleotide polymorphisms of human genes. A comparative characteristic of the frequency distribution of the studied polymorphisms with those presented in international databases as reference data was carried out using χ
2 index., Results: The frequency of SNP rs4986790 of the TLR 4 gene significantly differs from the EUR population ( p = 0.032) and the CEU subpopulation ( p = 0.047). The allele frequencies of the rs1800795 ( IL 6 ) and rs1800896 ( IL 10 ) polymorphisms in the study population differ from the CEU subgroup ( p = 0.030 and 0.012, respectively). The frequency of SNP rs2295119 ( HLA - DPA 2 ) in the study group is significantly different from the EUR population ( p = 0.034)., Conclusion: The analysis carried out in this work confirms the need to create a domestic information system containing data on the occurrence of SNP alleles and genotypes for a conditionally healthy population and in subgroups with various pathological conditions.- Published
- 2023
- Full Text
- View/download PDF
17. Sharing mutants and experimental information prepublication using FgMutantDb (https://scabusa.org/FgMutantDb).
- Author
-
Baldwin, Thomas T., Basenko, Evelina, Harb, Omar, Brown, Neil A., Urban, Martin, Hammond-Kosack, Kim E., and Bregitzer, Phil P.
- Subjects
- *
FUSARIUM , *FILAMENTOUS fungi , *FUNGAL genetics , *GENETIC databases , *SCIENTIFIC community - Abstract
There is no comprehensive storage for generated mutants of Fusarium graminearum or data associated with these mutants. Instead, researchers relied on several independent and non-integrated databases. FgMutantDb was designed as a simple spreadsheet that is accessible globally on the web that will function as a centralized source of information on F. graminearum mutants. FgMutantDb aids in the maintenance and sharing of mutants within a research community. It will serve also as a platform for disseminating prepublication results as well as negative results that often go unreported. Additionally, the highly curated information on mutants in FgMutantDb will be shared with other databases (FungiDB, Ensembl, PhytoPath, and PHI-base) through updating reports. Here we describe the creation and potential usefulness of FgMutantDb to the F. graminearum research community, and provide a tutorial on its use. This type of database could be easily emulated for other fungal species. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline.
- Subjects
- *
GENE families , *GENES , *WORKFLOW software , *WORKFLOW , *PIPELINES , *CHROMOSOME duplication , *GALAXIES - Abstract
Background: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. Findings: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. Conclusions: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Cervical cancer risk prediction with robust ensemble and explainable black boxes method
- Author
-
Francesco Curia
- Subjects
Information privacy ,020205 medical informatics ,Computer science ,business.industry ,Deep learning ,Biomedical Engineering ,Intelligent decision support system ,Bioengineering ,Context (language use) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Applied Microbiology and Biotechnology ,Clinical decision support system ,Order (exchange) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Sensitivity (control systems) ,Artificial intelligence ,business ,computer ,Cervical cancer ,Ensembl ,Interpretable AI ,Risk prediction ,Biotechnology ,Interpretability - Abstract
Clinical decision support systems (CDSS) that make use of algorithms based on intelligent systems, such as machine learning or deep learning, they suffer from the fact that often the methods used are hard to interpret and difficult to understand on how some decisions are made; the opacity of some methods, sometimes voluntary due to problems such as data privacy or the techniques used to protect intellectual property, makes these systems very complicated. Besides this series of problems, the results obtained also suffer from the poor possibility of being interpreted; in the clinical context therefore it is required that the methods used are as accurate as possible, transparent techniques and explainable results. In this work the problem of the development of cervical cancer is treated, a disease that mainly affects the female population. In order to introduce advanced machine learning techniques in a clinical decision support system that can be transparent and explainable, a robust, accurate ensemble method is presented, in terms of error and sensitivity linked to the classification of possible development of the aforementioned pathology and advanced techniques are also presented of explainability and interpretability (Explanaible Machine Learning) applied to the context of CDSS such as Lime and Shapley. The results obtained, as well as being interesting, are understandable and can be implemented in the treatment of this type of problem.
- Published
- 2021
- Full Text
- View/download PDF
20. Targeted sequencing reveals candidate causal variants for dairy bull subfertility
- Author
-
Francisco Peñagaricano, Rostam Abdollahi-Arpanahi, and H. A. Pacheco
- Subjects
0301 basic medicine ,Male ,animal diseases ,Short Communication ,Nonsense mutation ,Short Communications ,Biology ,Deep sequencing ,DNA sequencing ,Frameshift mutation ,03 medical and health sciences ,causal mutation ,Genetics ,Missense mutation ,Ensembl ,Animals ,Gene ,reproductive and urinary physiology ,Genome ,Sire ,0402 animal and dairy science ,04 agricultural and veterinary sciences ,General Medicine ,040201 dairy & animal science ,Dairying ,030104 developmental biology ,Fertility ,sire conception rate ,Fertilization ,Infertility ,Animal Science and Zoology ,Cattle - Abstract
Bull fertility is a key factor for successful reproductive performance in dairy cattle. Since the semen from a single bull can be used to inseminate hundreds of cows, one subfertile bull could have a major impact on herd reproductive efficiency. We have previously identified five genomic regions, located on BTA8 (72.2 Mb), BTA9 (43.7 Mb), BTA13 (60.2 Mb), BTA17 (63.3 Mb), and BTA27 (34.7 Mb), that show large dominance effects on bull fertility. Each of these regions explained about 5-8% of the observed differences in sire conception rate between Holstein bulls. Here, we aimed to identify candidate causal variants responsible for this variation using targeted sequencing (10 Mb per region). For each genomic region, two DNA pools were constructed from n ≈ 20 high-fertility and n ≈ 20 low-fertility Holstein bulls. The DNA-sequencing analysis included reads quality control (using FastQC), genome alignment (using BWA and ARS-UCD1.2), variant calling (using GATK) and variant annotation (using Ensembl). The sequencing depth per pool varied from 39× to 51×. We identified a set of nonsense mutations, missense mutations, and frameshift variants carried by low-fertility bulls. Notably, some of these variants were classified as strong candidate causal variants, i.e., mutations with deleterious effects located on genes exclusively/highly expressed in testis. Genes affected by these candidate causal variants include AK9, TTLL9, TCHP, and FOXN4. These results could aid in the development of novel genomic tools that allow early detection and culling of subfertile bull calves.
- Published
- 2021
21. MTR3D: identifying regions within protein tertiary structures under purifying selection
- Author
-
Moshe Olshansky, Douglas E. V. Pires, Michael Silk, David B. Ascher, Natalie P. Thorne, Elston N D'Souza, Carlos H M Rodrigues, Ascher, David [0000-0003-2948-2413], and Apollo - University of Cambridge Repository
- Subjects
AcademicSubjects/SCI00010 ,Mutation, Missense ,Computational biology ,Genomics ,Biology ,Protein Structure, Tertiary ,Negative selection ,Structural Homology, Protein ,Neoplasms ,Web Server Issue ,Genetics ,Human proteome project ,RefSeq ,Missense mutation ,Ensembl ,Humans ,Human genome ,Gene ,Exome ,Software - Abstract
The identification of disease-causal variants is non-trivial. By mapping population variation from over 448,000 exome and genome sequences to over 81,000 experimental structures and homology models of the human proteome, we have calculated both regional intolerance to missense variation (Missense Tolerance Ratio, MTR), using a sliding window of 21–41 codons, and introduce a new 3D spatial intolerance to missense variation score (3D Missense Tolerance Ratio, MTR3D), using spheres of 5–8 Å. We show that the MTR3D is less biased by regions with limited data and more accurately identifies regions under purifying selection than estimates relying on the sequence alone. Intolerant regions were highly enriched for both ClinVar pathogenic and COSMIC somatic missense variants (Mann–Whitney U test P < 2.2 × 10−16). Further, we combine sequence- and spatial-based scores to generate a consensus score, MTRX, which distinguishes pathogenic from benign variants more accurately than either score separately (AUC = 0.85). The MTR3D server enables easy visualisation of population variation, MTR, MTR3D and MTRX scores across the entire gene and protein structure for >17,000 human genes and >42,000 alternative alternate transcripts, including both Ensembl and RefSeq transcripts. MTR3D is freely available by user-friendly web-interface and API at http://biosig.unimelb.edu.au/mtr3d/., Graphical Abstract Graphical AbstractMTR3D.
- Published
- 2021
22. The 2021 Nucleic Acids Research database issue and the online molecular biology database collection
- Author
-
Xosé M Fernández and Daniel J. Rigden
- Subjects
AcademicSubjects/SCI00010 ,WikiPathways : Pathways for the people ,Rfam ,Genomics ,Genome browser ,Biology ,computer.software_genre ,03 medical and health sciences ,0302 clinical medicine ,Nucleic Acids ,Genetics ,Humans ,Ensembl ,KEGG ,Epidemics ,FlyBase : A Database of Drosophila Genes & Genomes ,Molecular Biology ,030304 developmental biology ,Comparative genomics ,Internet ,0303 health sciences ,Database ,SARS-CoV-2 ,Research ,COVID-19 ,Computational Biology ,Molecular biology ,Editorial ,Periodicals as Topic ,Databases, Nucleic Acid ,computer ,030217 neurology & neurosurgery - Abstract
The 2021 Nucleic Acids Research database Issue contains 189 papers spanning a wide range of biological fields and investigation. It includes 89 papers reporting on new databases and 90 covering recent changes to resources previously published in the Issue. A further ten are updates on databases most recently published elsewhere. Seven new databases focus on COVID-19 and SARS-CoV-2 and many others offer resources for studying the virus. Major returning nucleic acid databases include NONCODE, Rfam and RNAcentral. Protein family and domain databases include COG, Pfam, SMART and Panther. Protein structures are covered by RCSB PDB and dispersed proteins by PED and MobiDB. In metabolism and signalling, STRING, KEGG and WikiPathways are featured, along with returning KLIFS and new DKK and KinaseMD, all focused on kinases. IMG/M and IMG/VR update in the microbial and viral genome resources section, while human and model organism genomics resources include Flybase, Ensembl and UCSC Genome Browser. Cancer studies are covered by updates from canSAR and PINA, as well as newcomers CNCdatabase and Oncovar for cancer drivers. Plant comparative genomics is catered for by updates from Gramene and GreenPhylDB. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been substantially updated, revisiting nearly 1000 entries, adding 90 new resources and eliminating 86 obsolete databases, bringing the current total to 1641 databases. It is available at https://www.oxfordjournals.org/nar/database/c/.
- Published
- 2020
- Full Text
- View/download PDF
23. Inclusion of a music learner with ASD through play-full intersubjectivity
- Author
-
Lauri A Hogle
- Subjects
media_common.quotation_subject ,05 social sciences ,050109 social psychology ,Empathy ,06 humanities and the arts ,medicine.disease ,060404 music ,Education ,Developmental psychology ,Autism spectrum disorder ,medicine ,Choir ,Ensembl ,0501 psychology and cognitive sciences ,Psychology ,Inclusion (education) ,0604 arts ,Music ,Intersubjectivity ,media_common - Abstract
Through a case study of Jad (pseudonym), a music learner with autism spectrum disorder (ASD), I sought to understand his experiences as he engaged in peer scaffolding activities of a choral ensemble. The study illuminated the role of intersubjectivity (or shared understanding) in socially mediated music learning within an environment of inclusion. Through inclusive, play-full, intersubjective attunement of younger children to Jad, he increasingly took on a role as an empathetic teacher-helper, initially with his younger sister, then with other young children, then with the entire ensemble. Jad also increasingly displayed musical agency through physical movement during music-making, contributing to others’ understanding and musical agency. The findings describe intersections of play with intersubjectivity, focusing on learner attunement to affect and emotion in fostering an inclusive music education experience. Making space for peer scaffolding and playfulness within this music learning environment fostered shared understanding and empathy among all learners, including one with ASD.
- Published
- 2020
- Full Text
- View/download PDF
24. GENCODE 2021
- Author
-
Fabio C. P. Navarro, Jonathan M. Mudge, S. Mohanan, Adam Frankish, Joel Armstrong, Tiago Grego, Irwin Jungreis, Roderic Guigó, Jinrui Xu, Benedict Paten, Cristina Sisu, Daniel R. Zerbino, Julien Lagarde, Mark Diekhans, José M. González, Michael L. Tress, E. Stapleton, Osagie G. Izuogu, Mark Gerstein, Ian T. Fiddes, Toby Hunt, Sarah Donaldson, Marie Marthe Suner, Fernando Pozo, Andrew D. Yates, S. Carbonell Sala, T. Di Domenico, Matthew Hardy, Barbara Uszczynska-Ratajczak, Fiona Cunningham, Andrew Berry, Anne Parker, Laura Martinez, Alexandra Bignell, Bianca M. Schmitt, Yan Zhang, Jane E. Loveland, Baikang Pei, Jyoti S. Choudhary, F. C. Riera, Paul R. Muir, C. Garcia Giron, Tim Hubbard, Fergal J. Martin, Rory Johnson, Magali Ruffier, If Barnes, James C. Wright, I. Sycheva, Manolis Kellis, Carles Boix, Thibaut Hourlier, Paul Flicek, Maxim Y Wolf, Y. T. Yang, and Kerstin Howe
- Subjects
Transcription, Genetic ,AcademicSubjects/SCI00010 ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,610 Medicine & health ,Computational biology ,Genome browser ,Biology ,Genome ,Mice ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Databases, Genetic ,Genetics ,Database Issue ,Animals ,Humans ,Ensembl ,Epidemics ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,030304 developmental biology ,Internet ,0303 health sciences ,SARS-CoV-2 ,GENCODE ,COVID-19 ,Computational Biology ,Molecular Sequence Annotation ,Genomics ,ComputingMethodologies_PATTERNRECOGNITION ,Genome Biology ,RNA, Long Noncoding ,Pseudogenes ,030217 neurology & neurosurgery ,Reference genome - Abstract
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org. National Human Genome Research Institute of the National Institutes of Health [U41HG007234]; the content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health; Wellcome Trust [WT108749/Z/15/Z, WT200990/Z/16/Z]; European Molecular Biology Laboratory; Swiss National Science Foundation through the National Center of Competence in Research ‘RNA & Disease’ (to R.J.); Medical Faculty of the University of Bern (to R.J). Funding for open access charge: National Institutes of Health.
- Published
- 2020
- Full Text
- View/download PDF
25. WormBase ParaSite − a comprehensive resource for helminth genomics.
- Author
-
Howe, Kevin L., Bolt, Bruce J., Shafie, Myriam, Kersey, Paul, and Berriman, Matthew
- Subjects
- *
HELMINTHS , *NUCLEOTIDE sequence , *PARASITES , *GENE expression , *GENOMICS - Abstract
The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite ( http://parasite.wormbase.org ) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
26. Pharmacogenomics and Pharmacogenetics: In Silico Prediction of Drug Effects in Treatments for Novel Coronavirus SARS-CoV2 Disease
- Author
-
Salvatore Pisconti, Pier Luigi Surico, Claudia Fabrizio, Andrea Cacciamani, Raffaele Palmirotta, Agnese Re, Francesca Romano, Raffaele Di Francia, Gerardo D'Amato, Alessandra Micera, Concetta Cafiero, and Delio Monaco
- Subjects
0301 basic medicine ,Pharmacology ,PharmGKB ,business.industry ,In silico ,Context (language use) ,Genome browser ,Computational biology ,Precision medicine ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Pharmacogenomics ,Molecular Medicine ,Ensembl ,Medicine ,business ,Pharmacogenetics - Abstract
The latest developments in precision medicine allow the modulation of therapeutic approaches in different pathologies on the basis of the specific molecular characterization of the patient. This review of the literature coupled with in silico analysis was to provide a selected screening of interactions between single-nucleotide polymorphisms (SNPs) and drugs (repurposed, investigational, and biological agents) showing efficacy and toxicityin counteracting Covid-19 infection. In silico analysis of genetic variants related to each drug was performed on such databases as PharmGKB, Ensembl Genome Browser, www.drugs.com, and SNPedia, with an extensive literature review of papers (to May 10, 2020) on Covid-19 treatments using Medline, Embase, International Pharmaceutical Abstracts, PharmGKB, and Google Scholar. The clinical relevance of SNPs, known as both drug targets and markers, considering genetic variations with known drug responses, and the therapeutic consequences are discussed. In the context of clinical treatment of Covid-19, including infection prevention, control measures, and supportive care, this review highlights the importance of a personalized approach in the final selection of therapy, which is probably essential in the management of the Covid-19 pandemic.
- Published
- 2020
- Full Text
- View/download PDF
27. StoneMod: a database for kidney stone modulatory proteins with experimental evidence
- Author
-
Paleerath Peerapen, Visith Thongboonkerd, and Supatcha Sassanarakkit
- Subjects
0301 basic medicine ,Renal calculi ,Urology ,030232 urology & nephrology ,Human Protein Atlas ,lcsh:Medicine ,Biology ,computer.software_genre ,Article ,Bioinorganic chemistry ,03 medical and health sciences ,Kidney Calculi ,0302 clinical medicine ,medicine ,Ensembl ,Humans ,Databases, Protein ,lcsh:Science ,Multidisciplinary ,Stone formation ,Database ,Calcium Oxalate ,Protein databases ,lcsh:R ,Proteins ,medicine.disease ,030104 developmental biology ,Kidney stone disease ,Kidney stones ,lcsh:Q ,UniProt ,PeptideAtlas ,Crystallization ,computer ,Software - Abstract
Better understanding of molecular mechanisms for kidney stone formation is required to improve management of kidney stone disease with better therapeutic outcome. Recent kidney stone research has indicated critical roles of a group of proteins, namely ‘stone modulators’, in promotion or inhibition of the stone formation. Nevertheless, such information is currently dispersed and difficult to obtain. Herein, we present the kidney stone modulator database (StoneMod), which is a curated resource by obtaining necessary information of such stone modulatory proteins, which can act as stone promoters or inhibitors, with experimental evidence from previously published studies. Currently, the StoneMod database contains 10, 16, 13, 8 modulatory proteins that affect calcium oxalate crystallization, crystal growth, crystal aggregation, and crystal adhesion on renal tubular cells, respectively. Informative details of each modulatory protein and PubMed links to the published articles are provided. Additionally, hyperlinks to other protein/gene databases (e.g., UniProtKB, Swiss-Prot, Human Protein Atlas, PeptideAtlas, and Ensembl) are made available for the users to obtain additional in-depth information of each protein. Moreover, this database provides a user-friendly web interface, in which the users can freely access to the information and/or submit their data to deposit or update. Database URL:https://www.stonemod.org.
- Published
- 2020
- Full Text
- View/download PDF
28. Genome-wide analysis of LATERAL ORGAN BOUNDARIES DOMAIN-in Physcomitrella patens and stress responses
- Author
-
Yanjing Liu, Xiaolong Huang, Huiqing Yan, and Yin Yi
- Subjects
0106 biological sciences ,0301 basic medicine ,Computational biology ,Physcomitrella patens ,01 natural sciences ,Biochemistry ,Genome ,Transcriptome ,03 medical and health sciences ,Stress, Physiological ,Genetics ,Gene family ,Ensembl ,Molecular Biology ,Gene ,Disease Resistance ,Plant Proteins ,biology ,Abiotic stress ,biology.organism_classification ,Bryopsida ,Human genetics ,030104 developmental biology ,Genome, Plant ,Transcription Factors ,010606 plant biology & botany - Abstract
LBDs, as the plant-specific gene family, play essential roles in lateral organ development, plant regeneration, as well as abiotic stress and pathogen response. However, the number and characteristic of LBD genes in Pyscomitrella patens were still obscure. This study was performed to identify the LBD family gene in moss and to determine the expression profiles of LBDs under the abiotic and pathogen stress. Complete genome sequences and transcriptomes of P. patens were downloaded from the Ensembl plant database. The hidden Markov model-based profile of the conserved LOB domain was submitted as a query to identify all potential LOB domain sequences with HMMER software. Expression profiles of PpLBDs were obtained based on the GEO public database and qRT-PCR analysis. In this study, a total of 31 LBDs were identified in the P. patens genome, divided into two classes based on the presence of the leucine zipper-like coiled-coil motif. A phylogenetic relationship was obtained between 31 proteins from P. patens and 43 proteins from the Arabidopsis thaliana genome, providing insights into their conserved and potential functions. Furthermore, the exon–intron organization of each PpLBD were analyzed. All PpLBD contain the conserved DNA binding motif (CX2CX6CX3C zinc finger-like motif), and were predicted to be located in cell nuclear. The 31 PpLBD genes were unevenly assigned to 18 out of 27 chromosomes based on the physical positions. Among these genes, PpLBD27 was not only remarkably highest expressed in desiccation, but also a susceptible gene to pathogens through jasmonic acid-mediated signaling pathway. Most of PpLBDs were up-regulated with the treatment of mannitol. These results showed they were differentially induced and their potential functions in the environmental stimulus of the early terrestrial colonizers. Despite significant differences in the life cycle in P. patens and flowering plants, their functions involved in abiotic and biotic stress-regulated by LBDs have been identified and appear to be conserved in the two lineages. These results provided a comprehensive analysis of PpLBDs and paved insights into studies aimed at a better understanding of PpLBDs.
- Published
- 2020
- Full Text
- View/download PDF
29. Association of Peroxisome Proliferator-Activated Receptors (PPARs) with Diabetic Retinopathy in Human and Animal Models: Analysis of the Literature and Genome Browsers
- Author
-
Špela Tajnšek, Tanja Kunej, Danijel Petrovič, and Mojca Globočnik Petrovič
- Subjects
0301 basic medicine ,chemistry.chemical_classification ,Peroxisome proliferator-activated receptor gamma ,QH301-705.5 ,Peroxisome proliferator-activated receptor ,Review Article ,Genome browser ,Diabetic retinopathy ,Biology ,Bioinformatics ,medicine.disease ,Genome ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Nuclear receptor ,chemistry ,Drug Discovery ,030221 ophthalmology & optometry ,medicine ,Ensembl ,Pharmacology (medical) ,Biology (General) ,Gene - Abstract
Diabetic retinopathy (DR) is a condition that develops after long-lasting and poorly handled diabetes and is presently the main reason for blindness among elderly and youth. Peroxisome proliferator-activated receptors (PPARs) are nuclear receptors that are involved in carbohydrate and fatty-acid metabolism and have also been associated with DR. Three PPAR isoforms are known: PPARG, PPARA, and PPARD. In the present study, we retrieved articles reporting associations between PPARs and DR from PubMed database and compiled the data in two catalogues, for human and animal models. Extracted data was then complemented with additional relevant genomic information. Seven retrieved articles reported testing an association between PPARs with DR in human. Four of them concluded association of PPARG and PPARA with DR in European and Asian populations, having a protective role on DR development. One study reported pathogenic role of PPARG, while two articles reported no association between PPARG and DR among Indian and Chinese populations. Six retrieved articles reported testing of involvement of PPARG and PPARA in DR in animal models, including mouse and rat. The review includes case-control studies, meta-analysis, expression studies, animal models, and cell line studies. Despite a large number of documented sequence variants of the PPAR genes available in genome browsers, researchers usually focus on a small set of previously reported variants. Data extraction from Ensembl genome browser revealed several sequence variants with predicted deleterious effect on protein function which present candidates for further experimental validation. Results of the present analysis will enable more holistic approach for understanding of PPARs in DR development. Additionally, developed catalogues present a baseline for standardized reporting of PPAR-phenotype association in upcoming studies.
- Published
- 2020
- Full Text
- View/download PDF
30. Human gene and disease associations for clinical‐genomics and precision medicine research
- Author
-
Saman Zeeshan, Zeeshan Ahmed, XinQi Dong, and Dinesh Mendhe
- Subjects
0301 basic medicine ,precision medicine ,Medicine (miscellaneous) ,Genome-wide association study ,Genomics ,Computational biology ,Disease ,Biology ,drugs ,GeneCards ,diseases ,03 medical and health sciences ,0302 clinical medicine ,Ensembl ,genes ,Research Articles ,database ,lcsh:R5-920 ,business.industry ,GENCODE ,Precision medicine ,030104 developmental biology ,030220 oncology & carcinogenesis ,somatic mutations ,Molecular Medicine ,germline mutations ,Personalized medicine ,business ,lcsh:Medicine (General) ,clinical‐genomics ,Research Article - Abstract
We are entering the era of personalized medicine in which an individual's genetic makeup will eventually determine how a doctor can tailor his or her therapy. Therefore, it is becoming critical to understand the genetic basis of common diseases, for example, which genes predispose and rare genetic variants contribute to diseases, and so on. Our study focuses on helping researchers, medical practitioners, and pharmacists in having a broad view of genetic variants that may be implicated in the likelihood of developing certain diseases. Our focus here is to create a comprehensive database with mobile access to all available, authentic and actionable genes, SNPs, and classified diseases and drugs collected from different clinical and genomics databases worldwide, including Ensembl, GenCode, ClinVar, GeneCards, DISEASES, HGMD, OMIM, GTR, CNVD, Novoseek, Swiss‐Prot, LncRNADisease, Orphanet, GWAS Catalog, SwissVar, COSMIC, WHO, and FDA. We present a new cutting‐edge gene‐SNP‐disease‐drug mobile database with a smart phone application, integrating information about classified diseases and related genes, germline and somatic mutations, and drugs. Its database includes over 59 000 protein‐coding and noncoding genes; over 67 000 germline SNPs and over a million somatic mutations reported for over 19 000 protein‐coding genes located in over 1000 regions, published with over 3000 articles in over 415 journals available at the PUBMED; over 80 000 ICDs; over 123 000 NDCs; and over 100 000 classified gene‐SNP‐disease associations. We present an application that can provide new insights into the information about genetic basis of human complex diseases and contribute to assimilating genomic with phenotypic data for the availability of gene‐based designer drugs, precise targeting of molecular fingerprints for tumor, appropriate drug therapy, predicting individual susceptibility to disease, diagnosis, and treatment of rare illnesses are all a few of the many transformations expected in the decade to come.
- Published
- 2020
31. Mass spectrometry-based identification and characterization of human hypothetical proteins highlighting the inconsistency across the protein databases
- Author
-
Johny Ijaq, Medicharla V. Jagannadham, and Neeraja Bethi
- Subjects
Gel electrophoresis ,0303 health sciences ,Hypothetical protein ,Genome browser ,Computational biology ,Biology ,Proteomics ,03 medical and health sciences ,0302 clinical medicine ,Human proteome project ,Ensembl ,UniProt ,030217 neurology & neurosurgery ,030304 developmental biology ,Reference genome - Abstract
A myriad of predicted proteins have been described at the genome scale, but their existence has not been confirmed at the protein level. These proteins that are predicted to be expressed from an open-reading frame (ORF) but for which translation has not been demonstrated are known as hypothetical proteins and constitute major fraction of the human proteome. In this study, we aim to identify and characterize hypothetical proteins from human tumor cell lines, viz., HeLa, MCF7, and BT474, thus providing the analytical basis for their expression. We used gel electrophoresis followed by in-gel digestion of the selected protein lanes and subsequent LC–MS/MS analysis of protein tryptic digests. ENSEMBL genome browser was used for genomic alignment. On search against human hypothetical protein data from NCBI database, 110 common proteins were identified across the three selected cells lines. Out of these, 88 proteins were already functionally characterized and remaining 22 were still found to be unreviewed in UniProt, lacking the evidence of expression at the protein level. To explore them further, following HPP guidelines, 15 proteins were selected and aligned against human reference genome. Five hypothetical proteins were confirmed as isoforms of known proteins. We conclude that the proteomic approach used would serve as a suitable tool to validate the existence of predicted or hypothetical proteins at the protein level. The MS proteomics data have been deposited to the ProteomeXchange Consortium via PRIDE with the data set identifiers PXD014258.
- Published
- 2020
- Full Text
- View/download PDF
32. Inferences of Individual Drug Response-Related Long Non-coding RNAs Based on Integrating Multi-omics Data in Breast Cancer
- Author
-
Dandan Zhang, Jiawei Tian, Fuhui Peng, Lei Zhang, Chunjing Wang, Hao Cui, and Hanqing Kong
- Subjects
0301 basic medicine ,Drug ,media_common.quotation_subject ,drug response ,Computational biology ,Drug resistance ,Article ,03 medical and health sciences ,long non-coding RNAs ,0302 clinical medicine ,Breast cancer ,breast cancer ,Drug Discovery ,microRNA ,medicine ,Ensembl ,skin and connective tissue diseases ,Gene ,media_common ,business.industry ,lcsh:RM1-950 ,Omics ,medicine.disease ,030104 developmental biology ,lcsh:Therapeutics. Pharmacology ,030220 oncology & carcinogenesis ,Molecular Medicine ,multi-omics integration ,prognosis ,business ,Tamoxifen ,medicine.drug - Abstract
Differences in individual drug responses are obstacles in breast cancer (BRCA) treatment, so predicting responses would help to plan treatment strategies. The accumulation of cancer molecular profiling and drug response data provide opportunities and challenges to identify novel molecular signatures and mechanisms of tumor responsiveness to drugs in BRCA. This study evaluated drug responses with a multi-omics integrated system that depended on long non-coding RNAs (lncRNAs). We identified drug response-related lncRNAs (DRlncs) by combining expression data of lncRNA, microRNA, messenger RNA, methylation levels, somatic mutations, and the survival data of cancer patients treated with drugs. We constructed an integrated and computational multi-omics approach to identify DRlncs for diverse chemotherapeutic drugs in BRCA. Some DRlncs were identified with Adriamycin, Cytoxan, Tamoxifen, and all samples for BRCA patients. These DRlncs showed specific features regarding both expression and computational accuracies. The DRlnc-gene co-expression networks were constructed and analyzed. Key DRlncs, such as HOXA-AS2 (Ensembl: ENSG00000253552), in the drug Adriamycin were characterized. The experimental analysis also suggested that HOXA-AS2 (Ensembl: ENSG00000253552) was a key DRlnc in Adriamycin drug resistance in BRCA patients. Some DRlncs were associated with survival and some specific functions. A possible mechanism of DRlnc HOXA-AS2 (Ensembl: ENSG00000253552) in the Adriamycin drug response for BRCA resistance was inferred. In summary, this study provides a framework for lncRNA-based evaluation of clinical drug responses in BRCA. Understanding the underlying molecular mechanisms of drug responses will facilitate improved responses to chemotherapy and outcomes of BRCA treatment.
- Published
- 2020
33. Impact of gene annotation choice on the quantification of RNA-seq data
- Author
-
Yang Liao, Wei Shi, and David Chisanga
- Subjects
Base Sequence ,Sequence Analysis, RNA ,Applied Mathematics ,RNA-Seq ,Molecular Sequence Annotation ,Gene Annotation ,Computational biology ,Biology ,Biochemistry ,Computer Science Applications ,Annotation ,Structural Biology ,Gene expression ,Exome Sequencing ,RefSeq ,Ensembl ,Humans ,Molecular Biology ,Gene ,Reference genome ,Uncategorized - Abstract
Background RNA sequencing is currently the method of choice for genome-wide profiling of gene expression. A popular approach to quantify expression levels of genes from RNA-seq data is to map reads to a reference genome and then count mapped reads to each gene. Gene annotation data, which include chromosomal coordinates of exons for tens of thousands of genes, are required for this quantification process. There are several major sources of gene annotations that can be used for quantification, such as Ensembl and RefSeq databases. However, there is very little understanding of the effect that the choice of annotation has on the accuracy of gene expression quantification in an RNA-seq analysis. Results In this paper, we present results from our comparison of Ensembl and RefSeq human annotations on their impact on gene expression quantification using a benchmark RNA-seq dataset generated by the SEQC consortium. We show that the use of RefSeq gene annotation models led to better quantification accuracy, based on the correlation with ground truths including expression data from >800 real-time PCR validated genes, known titration ratios of gene expression and microarray expression data. We also found that the recent expansion of the RefSeq annotation has led to a decrease in its annotation accuracy. Finally, we demonstrated that the RNA-seq quantification differences observed between different annotations were not affected by the use of different normalization methods. Conclusion In conclusion, our study found that the use of the conservative RefSeq gene annotation yields better RNA-seq quantification results than the more comprehensive Ensembl annotation. We also found that, surprisingly, the recent expansion of the RefSeq database, which was primarily driven by the incorporation of sequencing data into the gene annotation process, resulted in a reduction in the accuracy of RNA-seq quantification.
- Published
- 2022
- Full Text
- View/download PDF
34. Phylonium: fast estimation of evolutionary distances from large samples of similar genomes
- Author
-
Fabian Klötzl and Bernhard Haubold
- Subjects
Statistics and Probability ,Unix ,0303 health sciences ,Sequence ,Genome ,Computer science ,Genomics ,Sequence Analysis, DNA ,Genome Analysis ,Original Papers ,Biochemistry ,Computer Science Applications ,03 medical and health sciences ,Computational Mathematics ,0302 clinical medicine ,Computational Theory and Mathematics ,Ensembl ,Line (text file) ,Molecular Biology ,Algorithm ,Algorithms ,Software ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Motivation Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. Results We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. Availability and implementation Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2019
- Full Text
- View/download PDF
35. Prevalence and Implications of Contamination in Public Genomic Resources: A Case Study of 43 Reference Arthropod Assemblies
- Author
-
Emeric Figuet, Clémentine M. Francois, Faustine Durand, Nicolas Galtier, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés (LEHNA), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE)-Centre National de la Recherche Scientifique (CNRS), Institut des Sciences de l'Evolution de Montpellier (UMR ISEM), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Montpellier (UM)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre National de la Recherche Scientifique (CNRS)-Institut de recherche pour le développement [IRD] : UR226, Université de Lyon-Université de Lyon-École Nationale des Travaux Publics de l'État (ENTPE)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-École pratique des hautes études (EPHE)-Université de Montpellier (UM)-Institut de recherche pour le développement [IRD] : UR226-Centre National de la Recherche Scientifique (CNRS), and setec ITS
- Subjects
Genomic data ,Genome, Insect ,Sequence assembly ,Computational biology ,QH426-470 ,Biology ,Investigations ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Sequence annotation ,Databases, Genetic ,Genetics ,Ensembl ,curation of genomic databases ,Animals ,Humans ,Molecular Biology ,Arthropods ,Genetics (clinical) ,ComputingMilieux_MISCELLANEOUS ,automated detection pipeline ,Phylogeny ,030304 developmental biology ,Synteny ,0303 health sciences ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Reproducibility of Results ,Molecular Sequence Annotation ,Genomics ,Sequence Analysis, DNA ,Contamination ,DNA Contamination ,Reference database ,contaminant sequences ,horizontal gene transfer ,[SDE.BE]Environmental Sciences/Biodiversity and Ecology ,030217 neurology & neurosurgery - Abstract
Thanks to huge advances in sequencing technologies, genomic resources are increasingly being generated and shared by the scientific community. The quality of such public resources are therefore of critical importance. Errors due to contamination are particularly worrying; they are widespread, propagate across databases, and can compromise downstream analyses, especially the detection of horizontally-transferred sequences. However we still lack consistent and comprehensive assessments of contamination prevalence in public genomic data. Here we applied a standardized procedure for foreign sequence annotation to 43 published arthropod genomes from the widely used Ensembl Metazoa database. This method combines information on sequence similarity and synteny to identify contaminant and putative horizontally-transferred sequences in any genome assembly, provided that an adequate reference database is available. We uncovered considerable heterogeneity in quality among arthropod assemblies, some being devoid of contaminant sequences, whereas others included hundreds of contaminant genes. Contaminants far outnumbered horizontally-transferred genes and were a major confounder of their detection, quantification and analysis. We strongly recommend that automated standardized decontamination procedures be systematically embedded into the submission process to genomic databases.
- Published
- 2019
- Full Text
- View/download PDF
36. The 27th annual Nucleic Acids Research database issue and molecular biology database collection
- Author
-
Daniel J. Rigden and Xosé M Fernández
- Subjects
Data management ,MEDLINE ,Genomics ,Genome browser ,Biology ,Web Browser ,computer.software_genre ,Genome ,03 medical and health sciences ,0302 clinical medicine ,Ensembl Genomes ,Databases, Genetic ,Genetics ,Ensembl ,Humans ,Molecular Biology ,030304 developmental biology ,Data Management ,0303 health sciences ,MiRTarBase ,Database ,business.industry ,Computational Biology ,Molecular biology ,Editorial ,business ,computer ,030217 neurology & neurosurgery - Abstract
The 2020 Nucleic Acids Research Database Issue contains 148 papers spanning molecular biology. They include 59 papers reporting on new databases and 79 covering recent changes to resources previously published in the issue. A further ten papers are updates on databases most recently published elsewhere. This issue contains three breakthrough articles: AntiBodies Chemically Defined (ABCD) curates antibody sequences and their cognate antigens; SCOP returns with a new schema and breaks away from a purely hierarchical structure; while the new Alliance of Genome Resources brings together a number of Model Organism databases to pool knowledge and tools. Major returning nucleic acid databases include miRDB and miRTarBase. Databases for protein sequence analysis include CDD, DisProt and ELM, alongside no fewer than four newcomers covering proteins involved in liquid–liquid phase separation. In metabolism and signaling, Pathway Commons, Reactome and Metabolights all contribute papers. PATRIC and MicroScope update in microbial genomes while human and model organism genomics resources include Ensembl, Ensembl genomes and UCSC Genome Browser. Immune-related proteins are covered by updates from IPD-IMGT/HLA and AFND, as well as newcomers VDJbase and OGRDB. Drug design is catered for by updates from the IUPHAR/BPS Guide to Pharmacology and the Therapeutic Target Database. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been revised, updating 305 entries, adding 65 new resources and eliminating 125 discontinued URLs; so bringing the current total to 1637 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
- Published
- 2019
37. The genome sequence of the painted lady, Vanessa cardui Linnaeus 1758
- Author
-
Jonathan Threlfall, Charlotte Wright, Aurora García-Berro, Gerard Talavera, Konrad Lohse, and Mark Blaxter
- Subjects
Whole genome sequencing ,biology ,Medicine (miscellaneous) ,Sequence assembly ,Chromosome ,Gene Annotation ,Painted lady ,biology.organism_classification ,General Biochemistry, Genetics and Molecular Biology ,Lepidoptera ,Chromosomal ,Evolutionary biology ,Vanessa cardui ,Genome sequence ,Ensembl ,Gene - Abstract
We present a genome assembly from an individual female Vanessa cardui (the painted lady; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 425 megabases in span. The majority of the assembly is scaffolded into 32 chromosomal pseudomolecules, with the W and Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,821 protein coding genes., Species taxonomy Background Genome sequence report Gene annotation Methods Sample acquisition and nucleic acid extraction Sequencing Genome assembly Ethical/compliance issues Data availability Author information References
- Published
- 2021
- Full Text
- View/download PDF
38. Integration and visualization of regulatory elements and variations of the EPAS1 gene in human
- Author
-
Tanja Kunej, Nataša Debeljak, and Aleša Kristan
- Subjects
Transcriptional Activation ,EPAS1 gene ,Genomics ,Computational biology ,Biology ,QH426-470 ,Proteomics ,Article ,Oxygen homeostasis ,hipoksija ,Genetics ,Basic Helix-Loop-Helix Transcription Factors ,Ensembl ,Humans ,Promoter Regions, Genetic ,Transcription factor ,Genetics (clinical) ,MiRTarBase ,Polymorphism, Genetic ,EPAS1 gen ,EPAS1 Gene ,EPAS1 ,Computational Biology ,HIF1A ,hypoxia-inducible factor (HIF) ,genetika ,udc:61:575 ,regulatory elements ,Protein Processing, Post-Translational ,medicina - Abstract
Endothelial PAS domain-containing protein 1 (EPAS1), also HIF2α, is an alpha subunit of hypoxia-inducible transcription factor (HIF), which mediates cellular and systemic response to hypoxia. EPAS1 has an important role in the transcription of many hypoxia-responsive genes, however, it has been less researched than HIF1α. The aim of this study was to integrate an increasing number of data on EPAS1 into a map of diverse OMICs elements. Publications, databases, and bioinformatics tools were examined, including Ensembl, MethPrimer, STRING, miRTarBase, COSMIC, and LOVD. The EPAS1 expression, stability, and activity are tightly regulated on several OMICs levels to maintain complex oxygen homeostasis. In the integrative EPAS1 map we included: 31 promoter-binding proteins, 13 interacting miRNAs and one lncRNA, and 16 post-translational modifications regulating EPAS1 protein abundance. EPAS1 has been associated with various cancer types and other diseases. The development of neuroendocrine tumors and erythrocytosis was shown to be associated with 11 somatic and 20 germline variants. The integrative map also includes 12 EPAS1 target genes and 27 interacting proteins. The study introduced the first integrative map of diverse genomics, transcriptomics, proteomics, regulomics, and interactomics data associated with EPAS1, to enable a better understanding of EPAS1 activity and regulation and support future research.
- Published
- 2021
39. The genome sequence of the small copper, Lycaena phlaeas (Linnaeus, 1760)
- Author
-
Jonathan Threlfall, Roger Vila, Konrad Lohse, Dominik Laetsch, and Mark Blaxter
- Subjects
Genetics ,Whole genome sequencing ,small copper ,viruses ,genome sequence ,Lycaenidae ,virus diseases ,Medicine (miscellaneous) ,Chromosome ,Sequence assembly ,Gene Annotation ,biochemical phenomena, metabolism, and nutrition ,Biology ,biology.organism_classification ,digestive system diseases ,General Biochemistry, Genetics and Molecular Biology ,Ensembl ,Lycaena phlaeas ,chromosomal ,Gene - Abstract
We present a genome assembly from an individual male Lycaena phlaeas (the small copper; Arthropoda; Insecta; Lepidoptera; Lycaenidae). The genome sequence is 420 megabases in span. The whole of the assembly is scaffolded into 24 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 12,147 protein coding genes.
- Published
- 2021
- Full Text
- View/download PDF
40. Annotating and prioritizing genomic variants using the Ensembl Variant Effect Predictor-A tutorial
- Author
-
Benjamin Moore, Andrew Parton, Diana Lemos, Anja Thormann, Ridwan M Amode, Aleena Mushtaq, Michal Szpak, Andrew D. Yates, Helen Schuilenburg, Fiona Cunningham, Irina M. Armean, Stephen J. Trevanion, Emily Perry, Paul Flicek, and Sarah E. Hunt
- Subjects
Application programming interface ,Interface (Java) ,GENCODE ,Access method ,Computational Biology ,Molecular Sequence Annotation ,Computational biology ,Genomics ,Biology ,Annotation ,Phenotype ,Gene Frequency ,Databases, Genetic ,Genetics ,RefSeq ,Ensembl ,Humans ,User interface ,Genetics (clinical) ,Software - Abstract
The Ensembl Variant Effect Predictor (VEP) is a freely available, open source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequence using the Ensembl/GENCODE or RefSeq gene sets. It also reports phenotype associations from databases such as ClinVar, allele frequencies from studies including gnomAD, and predictions of deleteriousness from tools such as Sorting Intolerant From Tolerant (SIFT) and Combined Annotation Dependent Depletion (CADD). Ensembl VEP includes filtering options to customise variant prioritisation. It is well supported and updated roughly quarterly to incorporate the latest gene, variant and phenotype association information. Ensembl VEP analysis can be performed using a highly configurable, extensible command-line tool, a Representational State Transfer (REST) application programming interface (API) and a user-friendly web interface. These access methods are designed to suit different levels of bioinformatics experience and meet different needs in terms of data size, visualisation and flexibility. In this tutorial, we will describe performing variant annotation using the Ensembl VEP web tool, which enables sophisticated analysis through a simple interface. This article is protected by copyright. All rights reserved.
- Published
- 2021
41. The genome sequence of the Glanville fritillary, Melitaea cinxia (Linnaeus, 1758)
- Author
-
Jonathan Threlfall, Charlotte Wright, Roger Vila, Konrad Lohse, and Mark Blaxter
- Subjects
Whole genome sequencing ,Glanville fritillary ,0303 health sciences ,genome sequence ,0206 medical engineering ,Medicine (miscellaneous) ,Chromosome ,Sequence assembly ,02 engineering and technology ,Gene Annotation ,Biology ,biology.organism_classification ,Melitaea cinxia ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Melitaea ,Evolutionary biology ,Ensembl ,chromosomal ,Gene ,020602 bioinformatics ,030304 developmental biology - Abstract
We present a genome assembly from an individual male Melitaea cinxia (the Glanville fritillary; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 499 megabases in span. The complete assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 13,666 protein coding genes.
- Published
- 2021
- Full Text
- View/download PDF
42. Integrative Map of HIF1A Regulatory Elements and Variations
- Author
-
Tanja Kunej
- Subjects
HIF1A Gene ,Computational biology ,Biology ,QH426-470 ,eritrocitoza ,microRNA ,rak ,hipoksija ,Genetics ,erythrocytosis ,Ensembl ,HIF ,cancer ,Transcription factor ,Gene ,Genetics (clinical) ,MiRTarBase ,hypoxia ,EPAS1 ,HIF1A ,hypoxia-inducible factor (HIF) ,HIF3A ,genetika ,udc:61:575 ,medicina - Abstract
Hypoxia-inducible factor (HIF) family of transcription factors (HIF1A, EPAS1, and HIF3A) are regulators of the cellular response to hypoxia. They have been shown to be involved in development of various diseases such as cancer, diabetes, and erythrocytosis. A complete map of connections between HIF family of genes with various omics types has not yet been developed. The main aim of the present analysis was to construct the integrative map of genomic elements associated with HIF1A gene and prioritize potentially deleterious variants. Various genomic databases and bioinformatics tools were used, including Ensembl, MirTarBase, STRING, Cytoscape, MethPrimer, CADD, SIFT, and UALCAN. Integrative HIF1A gene map was visualized and includes transcriptional and post-transcriptional regulators, downstream targets, and genetic variants. One CpG island overlaps transcription start site of the HIF1A gene. Out of over 450 missense variants, four have predicted deleterious effect on protein function by at least five bioinformatics tools. Currently there are 85 miRNAs reported to target HIF1A. HIF1A downstream targets include protein-coding genes, long noncoding RNAs, and microRNAs (hypoxamiRs). The study presents the first integration of heterogeneous molecular interactions associated with HIF1A gene enabling a holistic view of the gene and lays the groundwork for supplementing the data in the future.
- Published
- 2021
- Full Text
- View/download PDF
43. Trabajo de investigación: tráfico de ADN entre el genoma mitocondrial y el nuclear: estudio de las inserciones de Numts en la filogenia de los primates
- Author
-
Universidade da Coruña. Facultade de Ciencias, Roldán Rodríguez, Xabier, Universidade da Coruña. Facultade de Ciencias, and Roldán Rodríguez, Xabier
- Abstract
[Resumen]: El genoma humano se encuentra sujeto a las fuerzas evolutivas que dan forma a su arquitectura. En concreto, la integración de fragmentos del genoma mitocondrial en cromosomas nucleares (Numts) moldea esos genomas nucleares. Se han definido Numts en diversas familias de eucariotas, como la de los homínidos. Como consecuencia del proceso continuado de generación de Numts, estos han sido considerados buenos marcadores para llevar a cabo los estudios filogenéticos, así como loci potencialmente informativos para reconstruir su historia evolutiva. Para llevar a cabo este estudio, hemos seleccionado cinco especies de primates (humano, chimpancé, orangután, gorila y macaco) en las que estimaremos el número de Numts presentes en cada genoma, y que usaremos para realizar un análisis comparativo entre las secuencias de sus Numts, así como entre las secuencias mitocondriales correspondientes a estos. Además, obtendremos las relaciones de sintenia entre sus cromosomas. De los Numts identificados en la especie humana, elegiremos tres como casos de estudio y determinaremos si también se encuentran presentes en los genomas de las otras cuatro especies (copias ortólogas), para poder estimar su antigüedad mínima. También estimaremos el grado de identidad nucleotídica de cada inserción de Numt con la secuencia mitocondrial de su especie para saber el porcentaje de similitud entre la inserción de Numt y la secuencia mitocondrial contemporánea correspondiente., [Abstract]: The human genome is subject to the evolutionary forces that shape its architecture. Specifically, the integration of fragments of the mitochondrial genome into nuclear chromosomes (Numts) shapes those nuclear genomes. Numbers have been defined in various families of eukaryotes, such as that of the hominids. As a consequence of the continuous process of generation of Numts, they have been good marker results to carry out phylogenetic studies, as well as potentially informative loci to reconstruct their evolutionary history. To carry out this study, we have selected five species of primates (human, chimpanzee, orangutan, gorilla and macaque) in which we will estimate the number of Numts present in each genome, and that we will use to perform a comparative analysis between the sequences. of their Numts, as well as between the mitochondrial sequences corresponding to. In addition, we will obtain the synteny relationships between their chromosomes. Of the Numts identified in the human species, we will choose three as case studies and determine if they are also present in the genomes of the other four species (orthologous copies), in order to estimate their minimum age. We will also estimate the degree of nucleotide identity of each Numt insert with the mitochondrial sequence of its species to know the percentage of similarity between the Numt insert and the corresponding contemporary mitochondrial sequence.
- Published
- 2021
44. Comparative Transcriptomic Response Of Pancreatic And Breast Cancer Cells To Anacardic Acid Ans Olaparib
- Author
-
Ogu Stephen, Deborah Oganya Ogenyi, Ujah Moses Okwori, Joseph Luper Tsenum, and Abah Moses Owoicho
- Subjects
Cancer ,Computational biology ,Biology ,medicine.disease ,Olaparib ,Bioconductor ,chemistry.chemical_compound ,chemistry ,Gene expression ,medicine ,Ensembl ,KEGG ,Gene ,Reference genome - Abstract
The study seeks to compare the transcriptomic response of pancreatic and breast cancer cells to Anarcadic Acid and Olaparib via the preparation of Pancreatic Cancer Cell Culture which involves the seeding of PANC-1 cells in 6-well plates (5× 105 cells per well). 24hours later, cells will be untreated or treated by 5mM anacardic acid, 2mM olaparib or a combination of anacardic acid (5mM) and olaparib (2mM) for 48hours; after which Pancreatic Cancer Cell’s mRNA Library will be Prepared and Sequenced using the Illumina TruSeq™ RNA Sample Prep Kit v2. Samples will be sequenced on the Illumina HiSeq 2500, 2× 100bp paired-end reads, to a minimum depth of 30 million reads per sample. Thereafter, the Computational Analyses of Pancreatic Cancer RNA-seq Data will be done by obtaining a total of 240 million obtained reads of high quality clean tags which will then be mapped and annotated via human reference genome using Bioconductor package biomaRt (http://www.bioconductor.org) (Durinck et al 2009). Mapped reads with mapping quality 10 or more will be defined as uniquely mapped reads and used in the downstream analyses. Biological networks and pathways related to anachardic acid, olaparib and the combination will be analyzed with Ingenuity Pathway Analysis (IPA) software (Qiagen, CA, USA). The lists of all genes identified in gene expression analysis will be uploaded into the IPA software. For the analysis of networks and pathways, the cutoff values will be set at P≤ 1× 10−5 and FC≥ |2| respectively.Validation of RNA-seq Results by qRT-PCR via the expression of mRNA which will be determined in all 4 samples using Power SYBR® Green RNA-to-CT™ 1-Step Kit (Life Technologies, CA, USA). The Western blotting for the selected proteins will be performed, as described by Yue (Yue et al 2015). Thereafter, the Breast Cancer Cell Culture will be prepared and treated. Breast Cancer Cell’s mRNA RNA-seq will be prepared. The Truseq Stranded mRNA kit (Illumina) will be used to prepare mRNA libraries from 1 µg total RNA. Libraries will be confirmed on the Agilent 2100 Bioanalyzer and quantitated using the Illumina Library Quantification Kit, ABI Prism qPCR Mix from Kapa Biosystems and the ABI7900HT real-time PCR instrument. The differential Gene Expression will be analysed RNA-seq reads will be assembled according to the hg19.gtf annotation file (downloaded from ENSEMBL) (Flicek et al 2014) using Cufflinks (version 2.2.1) (Trapnell et al 2012). For each comparison, both cufflinks assemblies shall be merged, and the resulting merged gtf file serves as the transcript input for differential gene expression analysis in Gene Ontology and KEGG pathways. For three of the comparisons, a p-value cutoff ≤0.05 shall be used to determine differential expression. In-silico pathway and network analysis of differentially expressed genes shall be performed in MetaCore version 6.27 (GeneGO, Thomson Reuters, New York, N.Y.) (Bolser et al 2012). The results obtained will be statistically analysed. The results of RT-PCR shall be normalized to expression of GAPDH using the formula 2∆ CT. One-way ANOVA shall be used for comparing treatment with the combination of anacardic acid and olaparib to the untreated control. A P value less than 0.05 will be considered statistically significant.
- Published
- 2021
- Full Text
- View/download PDF
45. MVAR
- Author
-
Govindarajan Kunde-Ramamoorthy, Bahá El Kassaby, Francisco Castellanos, and Carol J. Bult
- Subjects
Annotation ,dbSNP ,Data access ,Computer science ,Ensembl ,Human genome ,Computational biology ,Mouse Genome Informatics ,JSON ,computer ,Genome ,computer.programming_language - Abstract
Model organisms are essential to understanding the biological and disease consequences of human genome variation. Bioinformatics resources that support meaningful comparisons of mouse and human genotype-to-phenotype data and knowledge are needed to support the translation from bench to bedside and back again [1]. There is no genome variation resource for mouse comparable to resources available for human genome variation data such as EXAC [2], ClinVar [3], or ClinGen [4]. NCBI resources such as dbSNP and ClinVar no longer accept data from model organisms. While the European Variation Archive (EVA) serves a repository of SNP data for mouse, however, the resource does not accept imputed variation data or curated phenotype annotations associated with variation data that are central to data interpretation and analysis. Although the Mouse Genome Informatics database (MGI) [5] serves as a comprehensive mouse allele registry and curates information about the association of mouse variants with phenotypes and disease, the variation data in MGI are not currently available in format consistent with the Human Genome Variation Society (HGVS) standards [6]. The Mouse Variation Registry (MVAR) will represent the integration of all mouse genome variation data and includes processes to automatically canonicalize variants so that they are uniquely represented in the database with comprehensive annotation and their distribution across strains. The starting dataset used as input into MVAR was downloaded in VCF format [7] (as a 42GB gzipped file) from the Mouse Genomes Project [8] and contains about 81M Single-Nucleotide Variants (SNV), ~9M Deletions and ~8M Insertions. Other data will be obtained from MGI, the Mouse Mutant Repository Database (MMRDB), the Diversity Outbred Database (DODB), and from computationally imputed SNP data. The MVAR data ingest workflow has been developed to normalize, prepare and annotate input variation data. With the help of the GATK framework [9], the first step of the pipeline consists of normalizing i.e., left aligning each variant, and decomposing the multi-allelic variants (where there is more than one variation in a row of data). The next step in the pipeline is made with the use of the Ensembl Variant Effect Predictor (VEP) [10], which annotates the variation data with its corresponding HGVS nomenclature and existing external Id. The final step uses the Jannovar library [11] to enrich the data with Functional Consequence annotations. After the data has been pre-processed through the pipeline, they are inserted into a MySQL database with the help of custom tools developed to create the canonical variants representations. MVAR supports programmatic data access to the registry through an API for interoperability. This API is used by a user-friendly web-application with rich user interfaces to query the database and display results. The API is also available to be a resource for other services or applications over HTTP with JSON data payloads. Wide-used industry frameworks like Angular and Groovy Grails were leveraged to build the MVAR web application. To conclude, the lack of a comprehensive, annotated genome variation resource for mouse is a significant barrier to comparing variation and its biological consequences between mouse and human and limits the impact of many research and resource development programs. The MVAR project seeks to address this resource gap by bringing together investigators that have active projects in the area of genome variation in either mouse or human or both. Many of the investigators on this project have developed independent resources to curate or manage genome variation. This project aims to unify these efforts and build a common data resource. Future work will include the incorporation of structural variants into the MVAR registry.
- Published
- 2021
- Full Text
- View/download PDF
46. L-Type Calcium Channel: Predicting Pathogenic/Likely Pathogenic Status for Variants of Uncertain Clinical Significance
- Author
-
Boris S. Zhorov, Anna Kostareva, and S. I. Tarnovskaya
- Subjects
sequence analysis ,Sequence analysis ,Long QT syndrome ,In silico ,Filtration and Separation ,TP1-1185 ,Biology ,Article ,disease informatics ,Chemical engineering ,medicine ,Chemical Engineering (miscellaneous) ,Missense mutation ,Ensembl ,Clinical significance ,L-type calcium channel ,protein structure ,Gene ,Genetics ,paralogue ,Process Chemistry and Technology ,Chemical technology ,medicine.disease ,missense variants ,variant annotation ,TP155-156 - Abstract
(1) Background: Defects in gene CACNA1C, which encodes the pore-forming subunit of the human Cav1.2 channel (hCav1.2), are associated with cardiac disorders such as atrial fibrillation, long QT syndrome, conduction disorders, cardiomyopathies, and congenital heart defects. Clinical manifestations are known only for 12% of CACNA1C missense variants, which are listed in public databases. Bioinformatics approaches can be used to predict the pathogenic/likely pathogenic status for variants of uncertain clinical significance. Choosing a bioinformatics tool and pathogenicity threshold that are optimal for specific protein families increases the reliability of such predictions. (2) Methods and Results: We used databases ClinVar, Humsavar, gnomAD, and Ensembl to compose a dataset of pathogenic/likely pathogenic and benign variants of hCav1.2 and its 20 paralogues: voltage-gated sodium and calcium channels. We further tested the performance of sixteen in silico tools in predicting pathogenic variants. ClinPred demonstrated the best performance, followed by REVEL and MCap. In the subset of 309 uncharacterized variants of hCav1.2, ClinPred predicted the pathogenicity for 188 variants. Among these, 36 variants were also categorized as pathogenic/likely pathogenic in at least one paralogue of hCav1.2. (3) Conclusions: The bioinformatics tool ClinPred and the paralogue annotation method consensually predicted the pathogenic/likely pathogenic status for 36 uncharacterized variants of hCav1.2. An analogous approach can be used to classify missense variants of other calcium channels and novel variants of hCav1.2.
- Published
- 2021
47. A Computational Approach to Evaluate the Combined Effect of SARS-CoV-2 RBD Mutations and ACE2 Receptor Genetic Variants on Infectivity: The COVID-19 Host-Pathogen Nexus
- Author
-
M. Dahmani Fathallah, Dana Ashoor, Sadok Chlif, Maryam Marzouq, Noureddine Ben Khalaf, and Hamdi Al Jarjanazi
- Subjects
Microbiology (medical) ,Immunology ,Protein Data Bank (RCSB PDB) ,Plasma protein binding ,Biology ,Molecular Dynamics Simulation ,Peptidyl-Dipeptidase A ,medicine.disease_cause ,Microbiology ,Virus ,Cellular and Infection Microbiology ,Polymorphism (computer science) ,medicine ,Ensembl ,Humans ,Receptor ,Pathogen ,Original Research ,Infectivity ,Genetics ,Mutation ,receptor binding domain ,virus-host interaction ,SARS-CoV-2 ,Point mutation ,angiotensin converting enzyme 2 receptor ,COVID-19 ,spike ,QR1-502 ,Infectious Diseases ,SARS-CoV-2 mutations ,Spike Glycoprotein, Coronavirus ,hormones, hormone substitutes, and hormone antagonists ,Protein Binding - Abstract
SARS-CoV-2 infectivity is largely determined by the virus Spike protein binding to the ACE2 receptor. Meanwhile, marked infection rate differences were reported between populations and individuals. To understand the disease dynamic, we developed a computational approach to study the implications of both SARS-CoV-2 RBD mutations and ACE2 polymorphism on the stability of the virus-receptor complex. We used the 6LZG PDB RBD/ACE2 3D model, the mCSM platform, the LigPlot+ and PyMol software to analyze the data on SARS-CoV-2 mutations and ACE variants retrieved from GISAID and Ensembl/GnomAD repository. We observed that out of 351 RBD point mutations, 83% destabilizes the complex according to free energy (ΔΔG) differences. We also spotted variations in the patterns of polar and hydrophobic interactions between the mutations occurring in 15 out of 18 contact residues. Similarly, comparison of the effect on the complex stability of different ACE2 variants showed that the pattern of molecular interactions and the complex stability varies also according to ACE2 polymorphism. We infer that it is important to consider both ACE2 variants and circulating SARS-CoV-2 RBD mutations to assess the stability of the virus-receptor association and evaluate infectivity. This approach might offers a good molecular ground to mitigate the virus spreading.
- Published
- 2021
48. The value of primary transcripts to the clinical and non‐clinical genomics community: Survey results and roadmap for improvements
- Author
-
Jane E. Loveland, Adam Frankish, Sarah E. Hunt, Fiona Cunningham, Joannella Morales, Aoife McMahon, Emily Perry, Irina M. Armean, and Paul Flicek
- Subjects
Computer science ,Genomics ,Computational biology ,QH426-470 ,Web Browser ,Genome ,Set (abstract data type) ,Annotation ,transcript annotation ,Databases, Genetic ,Genetics ,RefSeq ,Ensembl ,Animals ,Humans ,survey ,RNA, Messenger ,Molecular Biology ,Genetics (clinical) ,default transcript ,GENCODE ,variant interpretation ,Computational Biology ,Original Articles ,Original Article ,UniProt ,Biomarkers ,Software - Abstract
Background Variant interpretation is dependent on transcript annotation and remains time consuming and challenging. There are major obstacles for historical data reuse and for interpretation of new variants. First, both RefSeq and Ensembl/GENCODE produce transcript sets in common use, but there is currently no easy way to translate between the two. Second, the resources often used for variant interpretation (e.g. ClinVar, gnomAD, UniProt) do not use the same transcript set, nor default transcript or protein sequence. Method Ensembl ran a survey in 2018 to sample attitudes to choosing one default transcript per locus, and to gather data on reference sequences used by the scientific community. This was publicised on the Ensembl and UCSC genome browsers, by email and on social media. Results The survey had 788 responses from 32 different countries, the results of which we report here. Conclusions We present our roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants., After decades of avoiding the demand to highlight one transcript per locus in Ensembl, we ran a survey 2018 to assay opinions across the scientific community. Ignoring the problem of ‘one transcript’ was not making the issue go away; many important genomic resources had instead adopted their own methods of selecting one transcript (e.g. HGMD, Ensembl, gnomAD, UniProt, ClinVar, etc.). Here we report our results and roadmap to create an effective default set of transcripts for resources, and for reporting interpretation of clinical variants.
- Published
- 2021
49. Heterogeneity of Accompanying Phenotypes and Genomic Variants Involved in Microtia
- Author
-
Bo Pan, Xin Huang, Peipei Guo, Nuo Si, Changchen Wang, and Zhensheng Hu
- Subjects
Genetics ,DNA Copy Number Variations ,Genotype ,business.industry ,Microtia ,General Medicine ,Pathogenicity ,medicine.disease ,Phenotype ,The integument ,Otorhinolaryngology ,Chromosome 3 ,Mutation ,medicine ,Ensembl ,Humans ,Surgery ,Copy-number variation ,business ,Congenital Microtia - Abstract
OBJECTIVES The symptoms associated with microtia are ever-changing and not to stick to 1 pattern. The symptoms associated with microtia are constantly changing and are not set in stone. The aim of this article was to describe the various phenotypes from multiple systems found in microtitis patients included in the DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources database, and to analyze possible pathogenic mutations. METHODS DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources is an interactive web-based database, which incorporates a suite of tools designed to aid the interpretation of genomic variants. The term "microtia" was used as the search term, and the data extracted from the DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources for this study was updated until October 2020. Pearson chi-squared test was used to test associations between types of genomic variants and the pathogenicity of variants. RESULTS Of the 386 cases enrolled in the study, 99% (n = 382) had 1 or more associated abnormalities. The most frequently detected abnormalities were those of the face and neck (n = 362 [93.8% of all cases]); musculoskeletal system (n = 337 [87.3%]); and nervous system (n = 334 [86.5%]), followed by abnormalities of limbs (n = 252 [65.3%]); the eye (n = 212 [54.9%]); and the integument (n = 200 [51.8%]). Besides, a total of 479 genomic variants were determined, including sequence variants and copy number variants (loss and gain). The pathogenicity of loss-type variants was significantly higher among other types (P
- Published
- 2021
50. EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders
- Author
-
Fiona Cunningham, Tracy Fletcher, David R. FitzPatrick, Sarah E. Hunt, Panagiotis I. Sergouniotis, Ana Carvalho, Graeme C.M. Black, Claire Hardcastle, Eva Lenassi, Anja Thormann, Jamie M Ellingford, Simon C Ramsden, Andrew R Webster, and Michel Michaelides
- Subjects
Routine testing ,business.industry ,Medicine ,Diagnostic test ,Ensembl ,Computational biology ,Personalized medicine ,Prospective cohort study ,business ,OPHTHALMIC DISORDERS - Abstract
PurposeThe widespread adoption of genomic testing for individuals with ophthalmic disorders has increased demand on diagnostic genomic services for these conditions. Moreover, the clinical utility of a molecular diagnosis for individuals with inherited ophthalmic disorders is increasingly placing pressure on the speed and accuracy of genomic testing.MethodsWe created EyeG2P, a publically available resource to assist diagnostic filtering of genomic datasets for ophthalmic conditions, utilising the Ensembl Variant Effect Predictor. We assessed the sensitivity of EyeG2P for 1234 individuals with a broad range of conditions, who had previously received a confirmed molecular diagnosis through routine genomic diagnostic approaches. For a prospective cohort of 83 individuals, we also assessed the precision of EyeG2P in comparision to routine genomic diagnostic approaches.ResultsWe observed that EyeG2P had a 99.5% sensitivity for genomic variants previously identified as a molecular diagnosis for 1234 individuals. EyeG2P enabled a significant increase in precision in comparison to routine testing strategies (pConclusionAutomated filtering of genomic variants through EyeG2P can increase the efficiency of diagnostic testing for individuals with a broad range of inherited ophthalmic disorders.
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.