Descriptor: "Genome, Human" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Genome, Human"' showing total 53,368 results

Start Over Descriptor "Genome, Human"

53,368 results on '"Genome, Human"'

101. NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling.

Author: Hanssen F, Gabernet G, Bäuerle F, Stöcker B, Wiegand F, Smith NH, Mertes C, Neogi AG, Brandhoff L, Ossowski A, Altmueller J, Becker K, Petzold A, Sturm M, Stöcker T, Sivalingam S, Brand F, Schmidt A, Buness A, Probst AJ, Motameny S, and Köster J
Subjects: Humans, Software, Genome, Human, Genetic Variation, Reproducibility of Results, Genomics methods, Benchmarking methods, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards
Abstract: We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing., Competing Interests: No competing interests were disclosed., (Copyright: © 2024 Hanssen F et al.)
Published: 2024
Full Text: View/download PDF

102. Targeted long-read sequencing enriches disease-relevant genomic regions of interest to provide complete Mendelian disease diagnostics.

Author: Nakamichi K, Huey J, Sangermano R, Place EM, Bujakowska KM, Marra M, Everett LA, Yang P, Chao JR, Van Gelder RN, and Mustafi D
Subjects: Humans, Genetic Testing methods, Retinal Diseases genetics, Retinal Diseases diagnosis, Genome, Human, Genetic Diseases, Inborn genetics, Genetic Diseases, Inborn diagnosis, Genomics methods, Computational Biology methods, Sequence Analysis, DNA methods, Male, Haplotypes genetics, Female, High-Throughput Nucleotide Sequencing methods
Abstract: Despite advances in sequencing technologies, a molecular diagnosis remains elusive in many patients with Mendelian disease. Current short-read clinical sequencing approaches cannot provide chromosomal phase information or epigenetic information without further sample processing, which is not routinely done and can result in an incomplete molecular diagnosis in patients. The ability to provide phased genetic and epigenetic information from a single sequencing run would improve the diagnostic rate of Mendelian conditions. Here, we describe targeted long-read sequencing of Mendelian disease genes (TaLon-SeqMD) using a real-time adaptive sequencing approach. Optimization of bioinformatic targeting enabled selective enrichment of multiple disease-causing regions of the human genome. Haplotype-resolved variant calling and simultaneous resolution of epigenetic base modification could be achieved in a single sequencing run. The TaLon-SeqMD approach was validated in a cohort of 18 individuals with previous genetic testing targeting 373 inherited retinal disease (IRD) genes, yielding the complete molecular diagnosis in each case. This approach was then applied in 2 IRD cases with inconclusive testing, which uncovered noncoding and structural variants that were difficult to characterize by standard short-read sequencing. Overall, these results demonstrate TaLon-SeqMD as an approach to provide rapid phased-variant calling to provide the molecular basis of Mendelian diseases.
Published: 2024
Full Text: View/download PDF

103. Genomic dynamics of the Lower Yellow River Valley since the Early Neolithic.

Author: Du P, Zhu K, Wang M, Sun Z, Tan J, Sun B, Sun B, Wang P, He G, Xiong J, Huang Z, Meng H, Sun C, Xie S, Wang B, Ge D, Ma Y, Sheng P, Ren X, Tao Y, Xu Y, Qin X, Allen E, Zhang B, Chang X, Wang K, Bao H, Yu Y, Wang L, Ma X, Du Z, Guo J, Yang X, Wang R, Ma H, Li D, Pan Y, Li B, Zhang Y, Zheng X, Han S, Jin L, Chen G, Li H, Wang CC, and Wen S
Subjects: Humans, China, History, Ancient, DNA, Ancient analysis, Human Migration history, Rivers, Genetics, Population, Archaeology, Genetic Variation, Genomics, Genome, Human
Abstract: The Yellow River Delta played a vital role in the development of the Neolithic civilization of China. However, the population history of this region from the Neolithic transitions to the present remains poorly understood due to the lack of ancient human genomes. This especially holds for key Neolithic transitions and tumultuous turnovers of dynastic history. Here, we report genome-wide data from 69 individuals dating to 5,410-1,345 years before present (BP) at 0.008 to 2.49× coverages, along with 325 present-day individuals collected from 16 cities across Shandong. During the Middle to Late Dawenkou period, we observed a significant influx of ancestry from Neolithic Yellow River farmers in central China and some southern Chinese ancestry that mixed with local hunter-gatherers in Shandong. The genetic heritage of the Shandong Longshan people was found to be most closely linked to the Dawenkou culture. During the Shang to Zhou Dynasties, there was evidence of genetic admixture of local Longshan populations with migrants from the Central Plain. After the Qin to Han Dynasties, the genetic composition of the region began to resemble that of modern Shandong populations. Our genetic findings suggest that the middle Yellow River Basin farmers played a role in shaping the genetic affinity of neighboring populations in northern China during the Middle to Late Neolithic period. Additionally, our findings indicate that the genetic diversity in the Shandong region during the Zhou Dynasty may be linked with their complex ethnicities., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2024 Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

104. Range-limited Heaps' law for functional DNA words in the human genome.

Author: Li W, Almirantis Y, and Provata A
Subjects: Humans, Animals, Linguistics, Protein Domains, Genome, Human, DNA genetics
Abstract: Heaps' or Herdan-Heaps' law is a linguistic law describing the relationship between the vocabulary/dictionary size (type) and word counts (token) to be a power-law function. Its existence in genomes with certain definition of DNA words is unclear partly because the dictionary size in genome could be much smaller than that in a human language. We define a DNA word as a coding region in a genome that codes for a protein domain. Using human chromosomes and chromosome arms as individual samples, we establish the existence of Heaps' law in the human genome within limited range. Our definition of words in a genomic or proteomic context is different from other definitions such as over-represented k-mers which are much shorter in length. Although an approximate power-law distribution of protein domain sizes due to gene duplication and the related Zipf's law is well known, their translation to the Heaps' law in DNA words is not automatic. Several other animal genomes are shown herein also to exhibit range-limited Heaps' law with our definition of DNA words, though with various exponents. When tokens were randomly sampled and sample sizes reach to the maximum level, a deviation from the Heaps' law was observed, but a quadratic regression in log-log type-token plot fits the data perfectly. Investigation of type-token plot and its regression coefficients could provide an alternative narrative of reusage and redundancy of protein domains as well as creation of new protein domains from a linguistic perspective., Competing Interests: Declaration of competing interest There is no conflict of interests to declare from the authors., (Copyright © 2024 Elsevier Ltd. All rights reserved.)
Published: 2024
Full Text: View/download PDF

105. TULIPs decorate the three-dimensional genome of PFA ependymoma.

Author: Johnston MJ, Lee JJY, Hu B, Nikolic A, Hasheminasabgorji E, Baguette A, Paik S, Chen H, Kumar S, Chen CCL, Jessa S, Balin P, Fong V, Zwaig M, Michealraj KA, Chen X, Zhang Y, Varadharajan S, Billon P, Juretic N, Daniels C, Rao AN, Giannini C, Thompson EM, Garami M, Hauser P, Pocza T, Ra YS, Cho BK, Kim SK, Wang KC, Lee JY, Grajkowska W, Perek-Polnik M, Agnihotri S, Mack S, Ellezam B, Weil A, Rich J, Bourque G, Chan JA, Yong VW, Lupien M, Ragoussis J, Kleinman C, Majewski J, Blanchette M, Jabado N, Taylor MD, and Gallo M
Subjects: Humans, Infratentorial Neoplasms genetics, Infratentorial Neoplasms pathology, Genome, Human, Infant, Brain Neoplasms genetics, Brain Neoplasms pathology, Child, Male, Female, Ependymoma genetics
Abstract: Posterior fossa group A (PFA) ependymoma is a lethal brain cancer diagnosed in infants and young children. The lack of driver events in the PFA linear genome led us to search its 3D genome for characteristic features. Here, we reconstructed 3D genomes from diverse childhood tumor types and uncovered a global topology in PFA that is highly reminiscent of stem and progenitor cells in a variety of human tissues. A remarkable feature exclusively present in PFA are type B ultra long-range interactions in PFAs (TULIPs), regions separated by great distances along the linear genome that interact with each other in the 3D nuclear space with surprising strength. TULIPs occur in all PFA samples and recur at predictable genomic coordinates, and their formation is induced by expression of EZHIP. The universality of TULIPs across PFA samples suggests a conservation of molecular principles that could be exploited therapeutically., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2024 The Author(s). Published by Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

106. Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes.

Author: Tejura M, Fayer S, McEwen AE, Flynn J, Starita LM, and Fowler DM
Subjects: Humans, Genome, Human, Mutation, Missense, Genetic Variation, Calibration, Software, Databases, Genetic, Genome-Wide Association Study methods
Abstract: In silico variant effect predictions are available for nearly all missense variants but played a minimal role in clinical variant classification because they were deemed to provide only supporting evidence. Recently, the ClinGen Sequence Variant Interpretation (SVI) Working Group updated recommendations for variant effect prediction use. By analyzing control pathogenic and benign variants across all genes, they were able to compute evidence strength for predictor score intervals with some intervals generating moderate, strong, or even very strong evidence. However, this genome-wide approach could obscure heterogeneous predictor performance in different genes. We quantified the gene-by-gene performance of two top predictors, REVEL and BayesDel, by analyzing control variants in each predictor score interval in 3,668 disease-relevant genes. Approximately 10% of intervals had sufficient control variants for analysis, and ∼70% of these intervals exceeded the maximum number of incorrect predictions implied by the SVI recommendations. These trending discordant intervals arose owing to the divergence of the gene-specific distribution of predictions from the genome-wide distribution, suggesting that gene-specific calibration is needed in many cases. Approximately 22% of ClinVar missense variants of uncertain significance in genes we analyzed (REVEL = 100,629, BayesDel = 71,928) had predictions in trending discordant intervals. Thus, genome-wide calibrations could result in many variants receiving inappropriate evidence strength. To facilitate a review of the SVI's calibrations, we developed a web application enabling visualization of gene-specific predictions and trending concordant and discordant intervals., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2024 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

107. The need to diversify genomic studies: Insights from Andean highlanders and Amazonians.

Author: Alvim I, Silva-Carvalho C, Mendes de Aquino M, Borda V, Sanchez C, Padilla C, Cáceres O, Rezende-Diniz I, Saraiva-Duarte J, Faria-Costa L, Santolalla ML, Rodrigues-Soares F, Zolini C, Llerena A, O'Connor TD, Gilman RH, Guio H, and Tarazona-Santos E
Subjects: Humans, Genetic Variation, Indians, South American genetics, Genome, Human, South America, Precision Medicine, Genomics
Abstract: More globally diverse perspectives are needed in genomic studies and precision medicine practices on non-Europeans. Here, we illustrate this by discussing the distribution of clinically actionable genetic variants involved in drug response in Andean highlanders and Amazonians, considering their environment, history, genetic structure, and historical biases in the perception of biological diversity of Native Americans., Competing Interests: Declaration of interests The authors declare no competing interests., (Copyright © 2024 Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

108. Large-scale analysis of whole genome sequencing data from formalin-fixed paraffin-embedded cancer specimens demonstrates preservation of clinical utility.

Author: Basyuni S, Heskin L, Degasperi A, Black D, Koh GCC, Chmelova L, Rinaldi G, Bell S, Grybowicz L, Elgar G, Memari Y, Robbe P, Kingsbury Z, Caldas C, Abraham J, Schuh A, Jones L, Tischkowitz M, Brown MA, Davies HR, and Nik-Zainal S
Subjects: Humans, Genomics methods, Mutation, Genome, Human, Artifacts, Paraffin Embedding methods, Formaldehyde, Neoplasms genetics, Neoplasms pathology, Whole Genome Sequencing methods, Tissue Fixation methods
Abstract: Whole genome sequencing (WGS) provides comprehensive, individualised cancer genomic information. However, routine tumour biopsies are formalin-fixed and paraffin-embedded (FFPE), damaging DNA, historically limiting their use in WGS. Here we analyse FFPE cancer WGS datasets from England's 100,000 Genomes Project, comparing 578 FFPE samples with 11,014 fresh frozen (FF) samples across multiple tumour types. We use an approach that characterises rather than discards artefacts. We identify three artefactual signatures, including one known (SBS57) and two previously uncharacterised (SBS FFPE, ID FFPE), and develop an "FFPEImpact" score that quantifies sample artefacts. Despite inferior sequencing quality, FFPE-derived data identifies clinically-actionable variants, mutational signatures and permits algorithmic stratification. Matched FF/FFPE validation cohorts shows good concordance while acknowledging SBS, ID and copy-number artefacts. While FF-derived WGS data remains the gold standard, FFPE-samples can be used for WGS if required, using analytical advancements developed here, potentially democratising whole cancer genomics to many., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

109. Testing times: disentangling admixture histories in recent and complex demographies using ancient DNA.

Author: Williams MP, Flegontov P, Maier R, and Huber CD
Subjects: Humans, Genetics, Population methods, Gene Flow, Polymorphism, Single Nucleotide, Genome, Human, Evolution, Molecular, DNA, Ancient analysis, Models, Genetic
Abstract: Our knowledge of human evolutionary history has been greatly advanced by paleogenomics. Since the 2020s, the study of ancient DNA has increasingly focused on reconstructing the recent past. However, the accuracy of paleogenomic methods in resolving questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation remains an open question. We evaluated the performance and behavior of two commonly used methods, qpAdm and the f3-statistic, on admixture inference under a diversity of demographic models and data conditions. We performed two complementary simulation approaches-firstly exploring a wide demographic parameter space under four simple demographic models of varying complexities and configurations using branch-length data from two chromosomes-and secondly, we analyzed a model of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudohaploidization. We observe that population differentiation is the primary factor driving qpAdm performance. Notably, while complex gene flow histories influence which models are classified as plausible, they do not reduce overall performance. Under conditions reflective of the historical period, qpAdm most frequently identifies the true model as plausible among a small candidate set of closely related populations. To increase the utility for resolving fine-scaled hypotheses, we provide a heuristic for further distinguishing between candidate models that incorporates qpAdm model P-values and f3-statistics. Finally, we demonstrate a significant performance increase for qpAdm using whole-genome branch-length f2-statistics, highlighting the potential for improved demographic inference that could be achieved with future advancements in f-statistic estimations., Competing Interests: Conflicts of interest: The authors declare no conflicts of interest., (© The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America.)
Published: 2024
Full Text: View/download PDF

110. The Genomic and Cultural Diversity of the Inka Qhapaq Hucha Ceremony in Chile and Argentina.

Author: de la Fuente Castro C, Cortés C, Raghavan M, Castillo D, Castro M, Verdugo RA, and Moraga M
Subjects: Humans, Argentina, Chile, Genetic Variation, Cultural Diversity, Ceremonial Behavior, Indians, South American genetics, Genomics, Genome, Human
Abstract: The South American archaeological record has ample evidence of the socio-cultural dynamism of human populations in the past. This has also been supported through the analysis of ancient genomes, by showing evidence of gene flow across the region. While the extent of these signals is yet to be tested, the growing number of ancient genomes allows for more fine-scaled hypotheses to be evaluated. In this study, we assessed the genetic diversity of individuals associated with the Inka ritual, Qhapaq hucha. As part of this ceremony, one or more individuals were buried with Inka and local-style offerings on mountain summits along the Andes, leaving a very distinctive record. Using paleogenomic tools, we analyzed three individuals: two newly generated genomes from El Plomo Mountain (Chile) and El Toro Mountain (Argentina), and a previously published genome from Argentina (Aconcagua Mountain). Our results reveal a complex demographic scenario with each of the individuals showing different genetic affinities. Furthermore, while two individuals showed genetic similarities with present-day and ancient populations from the southern region of the Inka empire, the third individual may have undertaken long-distance movement. The genetic diversity we observed between individuals from similar cultural contexts supports the highly diverse strategies Inka implemented while incorporating new territories. More broadly, this research contributes to our growing understanding of the population dynamics in the Andes by discussing the implications and temporality of population movements in the region., (© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.)
Published: 2024
Full Text: View/download PDF

111. De Novo Genome Assemblies From Two Indigenous Americans from Arizona Identify New Polymorphisms in Non-Reference Sequences.

Author: Köroğlu Ç, Chen P, Traurig M, Altok S, Bogardus C, and Baier LJ
Subjects: Humans, Arizona, Polymorphism, Single Nucleotide, Whole Genome Sequencing, Indians, North American genetics, Genome, Human
Abstract: There is a collective push to diversify human genetic studies by including underrepresented populations. However, analyzing DNA sequence reads involves the initial step of aligning the reads to the GRCh38/hg38 reference genome which is inadequate for non-European ancestries. In this study, using long-read sequencing technology, we constructed de novo genome assemblies from two indigenous Americans from Arizona (IAZ). Each assembly included ∼17 Mb of DNA sequence not present [nonreference sequence (NRS)] in hg38, which consists mostly of repeat elements. Forty NRSs totaling 240 kb were uniquely anchored to the hg38 primary assembly generating a modified hg38-NRS reference genome. DNA sequence alignment and variant calling were then conducted with whole-genome sequencing (WGS) sequencing data from 387 IAZ using both the hg38 and modified hg38-NRS reference maps. Variant calling with the hg38-NRS map identified ∼50,000 single-nucleotide variants present in at least 5% of the WGS samples which were not detected with the hg38 reference map. We also directly assessed the NRSs positioned within genes. Seventeen NRSs anchored to regions including an identical 187 bp NRS found in both de novo assemblies. The NRS is located in HCN2 79 bp downstream of Exon 3 and contains several putative transcriptional regulatory elements. Genotyping of the HCN2-NRS revealed that the insertion is enriched in IAZ (minor allele frequency = 0.45) compared to other reference populations tested. This study shows that inclusion of population-specific NRSs can dramatically change the variant profile in an underrepresented ethnic groups and thereby lead to the discovery of previously missed common variations., (Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution 2024.)
Published: 2024
Full Text: View/download PDF

112. Deep5hmC: predicting genome-wide 5-hydroxymethylcytosine landscape via a multimodal deep learning model.

Author: Ma X, Thela SR, Zhao F, Yao B, Wen Z, Jin P, Zhao J, and Chen L
Subjects: Humans, Epigenesis, Genetic, Genome, Human, DNA Methylation, 5-Methylcytosine analogs & derivatives, 5-Methylcytosine metabolism, Deep Learning
Abstract: Motivation: 5-Hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remains a challenging task, especially when considering the complex interplay between DNA sequences and various epigenetic factors such as histone modifications and chromatin accessibility., Results: Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and epigenetic features such as histone modification and chromatin accessibility to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four developmental stages during forebrain organoid development and across 17 human tissues. Compared to DeepSEA and random forest, Deep5hmC achieves close to 4% and 17% improvement of Area Under the Receiver Operating Characteristic (AUROC) across four forebrain developmental stages, and 6% and 27% across 17 human tissues for predicting binary 5hmC modification sites; and 8% and 22% improvement of Spearman correlation coefficient across four forebrain developmental stages, and 17% and 30% across 17 human tissues for predicting continuous 5hmC modification. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions (DhMRs) in a case-control study of Alzheimer's disease (AD). Deep5hmC significantly improves our understanding of tissue-specific gene regulation and facilitates the development of new biomarkers for complex diseases., Availability and Implementation: Deep5hmC is available via https://github.com/lichen-lab/Deep5hmC., (© The Author(s) 2024. Published by Oxford University Press.)
Published: 2024
Full Text: View/download PDF

113. Preventive Human Genome Editing and Enhancement: Candidate Criteria for Governance.

Author: Juengst E, Flatt MA, Conley JM, Davis A, Henderson G, MacKay D, Major R, Walker RL, and Cadigan RJ
Subjects: Humans, Gene Editing ethics, Genome, Human
Abstract: While somatic cell editing to treat disease is widely accepted, the use of human genome editing for "enhancement" remains contested. Scientists and policy-makers routinely cite the prospect of enhancement as a salient ethical challenge for human genome editing research. If preventive genome editing projects are perceived as pursuing human enhancement, they could face heightened barriers to scientific, public, and regulatory approval. This article outlines what we call "preventive strengthening research" (or "PSR") to explore, through this example, how working to strengthen individuals' resistance to disease beyond what biomedicine considers to be the human functional range may be interpreted as pursuing human enhancement. Those involved in developing guidance for PSR will need to navigate the interface between preventive goals and enhancement implications. This article identifies and critiques three of these ideas in the interest of anticipating the wider emergence of PSR and the need for a normative approach for its pursuit. All three "candidate criteria" merit attention, but each also faces challenges that will need to be addressed as further research policy is developed., (© 2024 The Hastings Center.)
Published: 2024
Full Text: View/download PDF

114. GGTyper: genotyping complex structural variants using short-read sequencing data.

Author: Mirus T, Lohmayer R, Döhring C, Halldórsson BV, and Kehr B
Subjects: Humans, Genotype, Genotyping Techniques methods, Algorithms, High-Throughput Nucleotide Sequencing methods, Genomics methods, Genome, Human, Software, Genomic Structural Variation, Sequence Analysis, DNA methods
Abstract: Motivation: Complex structural variants (SVs) are genomic rearrangements that involve multiple segments of DNA. They contribute to human diversity and have been shown to cause Mendelian disease. Nevertheless, our abilities to analyse complex SVs are very limited. As opposed to deletions and other canonical types of SVs, there are no established tools that have explicitly been designed for analysing complex SVs., Results: Here, we describe a new computational approach that we specifically designed for genotyping complex SVs in short-read sequenced genomes. Given a variant description, our approach computes genotype-specific probability distributions for observing aligned read pairs with a wide range of properties. Subsequently, these distributions can be used to efficiently determine the most likely genotype for any set of aligned read pairs observed in a sequenced genome. In addition, we use these distributions to compute a genotyping difficulty for a given variant, which predicts the amount of data needed to achieve a reliable call. Careful evaluation confirms that our approach outperforms other genotypers by making reliable genotype predictions across both simulated and real data. On up to 7829 human genomes, we achieve high concordance with population-genetic assumptions and expected inheritance patterns. On simulated data, we show that precision correlates well with our prediction of genotyping difficulty. This together with low memory and time requirements makes our approach well-suited for application in biomedical studies involving small to very large numbers of short-read sequenced genomes., Availability and Implementation: Source code is available at https://github.com/kehrlab/Complex-SV-Genotyping., (© The Author(s) 2024. Published by Oxford University Press.)
Published: 2024
Full Text: View/download PDF

115. Activating the dark genome to illuminate cancer vaccine targets.

Author: Kwok DW, Okada H, and Costello JF
Subjects: Humans, Genome, Human, Neoplasms genetics, Neoplasms immunology, Cancer Vaccines immunology, Cancer Vaccines genetics, Cancer Vaccines therapeutic use
Published: 2024
Full Text: View/download PDF

116. Targeted enrichment of whole-genome SNPs from highly burned skeletal remains.

Author: Emery MV, Bolhofner K, Spake L, Ghafoor S, Versoza CJ, Rawls EM, Winingear S, Buikstra JE, Loreille O, Fulginiti LC, and Stone AC
Subjects: Humans, Sequence Analysis, DNA, Genome, Human, DNA isolation & purification, DNA Degradation, Necrotic, Male, Female, Polymorphism, Single Nucleotide, High-Throughput Nucleotide Sequencing, Tooth chemistry, Burns genetics, Body Remains, Fires, Bone and Bones chemistry
Abstract: Genetic assessment of highly incinerated and/or degraded human skeletal material is a persistent challenge in forensic DNA analysis, including identifying victims of mass disasters. Few studies have investigated the impact of thermal degradation on whole-genome single-nucleotide polymorphism (SNP) quality and quantity using next-generation sequencing (NGS). We present whole-genome SNP data obtained from the bones and teeth of 27 fire victims using two DNA extraction techniques. Extracts were converted to double-stranded DNA libraries then enriched for whole-genome SNPs using unpublished biotinylated RNA baits and sequenced on an Illumina NextSeq 550 platform. Raw reads were processed using the EAGER (Efficient Ancient Genome Reconstruction) pipeline, and the SNPs filtered and called using FreeBayes and GATK (v. 3.8). Mixed-effects modeling of the data suggest that SNP variability and preservation is predominantly determined by skeletal element and burn category, and not by extraction type. Whole-genome SNP data suggest that selecting long bones, hand and foot bones, and teeth subjected to temperatures <350°C are the most likely sources for higher genomic DNA yields. Furthermore, we observed an inverse correlation between the number of captured SNPs and the extent to which samples were burned, as well as a significant decrease in the total number of SNPs measured for samples subjected to temperatures >350°C. Our data complement previous analyses of burned human remains that compare extraction methods for downstream forensic applications and support the idea of adopting a modified Dabney extraction technique when traditional forensic methods fail to produce DNA yields sufficient for genetic identification., (© 2024 The Authors. Journal of Forensic Sciences published by Wiley Periodicals LLC on behalf of American Academy of Forensic Sciences.)
Published: 2024
Full Text: View/download PDF

117. Harnessing cancer genomes for precision oncology.

Author: Chanock SJ
Subjects: Humans, Medical Oncology methods, Genomics methods, Precision Medicine methods, Neoplasms genetics, Genome, Human
Published: 2024
Full Text: View/download PDF

118. Genome-scale quantification and prediction of pathogenic stop codon readthrough by small molecules.

Author: Toledano I, Supek F, and Lehner B
Subjects: Humans, Genome, Human, Protein Biosynthesis drug effects, Small Molecule Libraries pharmacology, Codon, Nonsense, Codon, Terminator
Abstract: Premature termination codons (PTCs) cause ~10-20% of inherited diseases and are a major mechanism of tumor suppressor gene inactivation in cancer. A general strategy to alleviate the effects of PTCs would be to promote translational readthrough. Nonsense suppression by small molecules has proven effective in diverse disease models, but translation into the clinic is hampered by ineffective readthrough of many PTCs. Here we directly tackle the challenge of defining drug efficacy by quantifying the readthrough of ~5,800 human pathogenic stop codons by eight drugs. We find that different drugs promote the readthrough of complementary subsets of PTCs defined by local sequence context. This allows us to build interpretable models that accurately predict drug-induced readthrough genome-wide, and we validate these models by quantifying endogenous stop codon readthrough. Accurate readthrough quantification and prediction will empower clinical trial design and the development of personalized nonsense suppression therapies., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

119. A Genomics England haplotype reference panel and imputation of UK Biobank.

Author: Shi S, Rubinacci S, Hu S, Moutsianas L, Stuckey A, Need AC, Palamara PF, Caulfield M, Marchini J, and Myers S
Subjects: Humans, England, Exome Sequencing methods, Genome, Human, Genomics methods, UK Biobank, United Kingdom, White People genetics, Gene Frequency, Genome-Wide Association Study methods, Haplotypes, Polymorphism, Single Nucleotide
Abstract: We built a reference panel with 342 million autosomal variants using 78,195 individuals from the Genomics England (GEL) dataset, achieving a phasing switch error rate of 0.18% for European samples and imputation quality of r 2 = 0.75 for variants with minor allele frequencies as low as 2 × 10 -4 in white British samples. The GEL-imputed UK Biobank genome-wide association analysis identified 70% of associations found by direct exome sequencing (P < 2.18 × 10 -11 ), while extending testing of rare variants to the entire genome. Coding variants dominated the rare-variant genome-wide association results, implying less disruptive effects of rare non-coding variants., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

120. An adenine base editor variant expands context compatibility.

Author: Xiao YL, Wu Y, and Tang W
Subjects: Humans, Escherichia coli genetics, Escherichia coli Proteins genetics, Escherichia coli Proteins metabolism, Proprotein Convertase 9 genetics, CRISPR-Associated Protein 9 genetics, CRISPR-Associated Protein 9 metabolism, CRISPR-Cas Systems genetics, Genome, Human, Streptococcus pyogenes genetics, Streptococcus pyogenes enzymology, Gene Editing methods, Adenosine Deaminase genetics, Adenosine Deaminase metabolism, Adenine metabolism
Abstract: Adenine base editors (ABEs) are precise gene-editing agents that convert A:T pairs into G:C through a deoxyinosine intermediate. Existing ABEs function most effectively when the target A is in a TA context. Here we evolve the Escherichia coli transfer RNA-specific adenosine deaminase (TadA) to generate TadA8r, which extends potent deoxyadenosine deamination to RA (R = A or G) and is faster in processing GA than TadA8.20 and TadA8e, the two most active TadA variants reported so far. ABE8r, comprising TadA8r and a Streptococcus pyogenes Cas9 nickase, expands the editing window at the protospacer adjacent motif-distal end and outperforms ABE7.10, ABE8.20 and ABE8e in correcting disease-associated G:C-to-A:T transitions in the human genome, with a controlled off-target profile. We show ABE8r-mediated editing of clinically relevant sites that are poorly accessed by existing editors, including sites in PCSK9, whose disruption reduces low-density lipoprotein cholesterol, and ABCA4-p.Gly1961Glu, the most frequent mutation in Stargardt disease., (© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.)
Published: 2024
Full Text: View/download PDF

121. Identification of whole-genome mutations and structural variations of bile cell-free DNA in cholangiocarcinoma.

Author: Yin L, Duan A, Zhang W, Li B, Zhao T, Xu X, Yang L, Nian B, Lu K, Chen S, Li Z, Liu J, Duan Q, Liu D, Chen H, Cui L, Chang Y, Kuang Y, Zhang D, Wang X, and Zhang Y
Subjects: Humans, Male, Female, Middle Aged, Aged, Whole Genome Sequencing, Genome, Human, Bile chemistry, Bile metabolism, Prognosis, Adult, Cholangiocarcinoma genetics, Bile Duct Neoplasms genetics, Mutation, Cell-Free Nucleic Acids genetics
Abstract: Bile cell-free DNA (cfDNA) has been reported as a promising liquid biopsy tool for cholangiocarcinoma (CCA), however, the whole-genome mutation landscape and structural variants (SVs) of bile cfDNA remains unknown. Here we performed whole-genome sequencing on bile cfDNA and analyzed the correlation between mutation characteristics of bile cfDNA and clinical prognosis. TP53 and KRAS were the most frequently mutated genes, and the RTK/RAS, homologous recombination (HR), and HIPPO were top three pathways containing most gene mutations. Ten overlapping putative driver genes were found in bile cfDNA and tumor tissue. SVs such as chromothripsis and kataegis were identified. Moreover, the hazard ratio of HR pathway mutations were 15.77 (95% CI: 1.571-158.4), patients with HR pathway mutations in bile cfDNA exhibited poorer overall survival (P = 0.0049). Our study suggests that bile cfDNA contains genome mutations and SVs, and HR pathway mutations in bile cfDNA can predict poor outcomes of CCA patients., Competing Interests: Declaration of competing interest All authors affiliated with 3D Medicines Inc. are current or former employees. No potential conflicts of interest were disclosed by the other authors, (Copyright © 2024. Published by Elsevier Inc.)
Published: 2024
Full Text: View/download PDF

122. Analysis of 10,478 cancer genomes identifies candidate driver genes and opportunities for precision oncology.

Author: Kinnersley B, Sud A, Everall A, Cornish AJ, Chubb D, Culliford R, Gruber AJ, Lärkeryd A, Mitsopoulos C, Wedge D, and Houlston R
Subjects: Humans, Genome, Human, Genomics methods, Medical Oncology methods, Neoplasms genetics, Precision Medicine methods, Mutation, Whole Genome Sequencing
Abstract: Tumor genomic profiling is increasingly seen as a prerequisite to guide the treatment of patients with cancer. To explore the value of whole-genome sequencing (WGS) in broadening the scope of cancers potentially amenable to a precision therapy, we analysed whole-genome sequencing data on 10,478 patients spanning 35 cancer types recruited to the UK 100,000 Genomes Project. We identified 330 candidate driver genes, including 74 that are new to any cancer. We estimate that approximately 55% of patients studied harbor at least one clinically relevant mutation, predicting either sensitivity or resistance to certain treatments or clinical trial eligibility. By performing computational chemogenomic analysis of cancer mutations we identify additional targets for compounds that represent attractive candidates for future clinical trials. This study represents one of the most comprehensive efforts thus far to identify cancer driver genes in the real world setting and assess their impact on informing precision oncology., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

123. Evaluation of genotype imputation using Glimpse tools on low coverage ancient DNA.

Author: Çubukcu H and Kılınç GM
Subjects: Humans, Genetics, Population methods, Software, Genomics methods, DNA, Ancient analysis, Genome, Human, Polymorphism, Single Nucleotide, Genotype
Abstract: Ancient DNA provides a unique frame for directly studying human population genetics in time and space. Still, since most of the ancient genomic data is low coverage, analysis is confronted with a low number of SNPs, genotype uncertainties, and reference-bias. Here, we for the first time benchmark the two distinct versions of Glimpse tools on 120 ancient human genomes from Eurasia including those largely from previously under-evaluated regions and compare the performance of genotype imputation with de facto analysis approaches for low coverage genomic data analysis. We further investigate the impact of two distinct reference panels on imputation accuracy for low coverage genomic data. We compute accuracy statistics and perform PCA and f 4 -statistics to explore the behaviour of genotype imputation on low coverage data regarding (i)two versions of Glimpse, (ii)two reference panels, (iii)four post-imputation filters and coverages, as well as (iv)data type and geographical origin of the samples on the analyses. Our results reveal that even for 0.1X coverage ancient human genomes, genotype imputation using Glimpse-v2 is suitable. Additionally, using the 1000 Genomes merged with Human Genome Diversity Panel improves the accuracy of imputation for the rare variants with low MAF, which might be important not only for ancient genomics but also for modern human genomic studies based on low coverage data and for haplotype-based analysis. Most importantly, we reveal that genotype imputation of low coverage ancient human genomes reduces the genetic affinity of the samples towards human reference genome. Through solving one of the most challenging biases in data analysis, so-called reference bias, genotype imputation using Glimpse v2 is promising for low coverage ancient human genomic data analysis and for rare-variant-based and haplotype-based analysis., (© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.)
Published: 2024
Full Text: View/download PDF

124. Fine-mapping across diverse ancestries drives the discovery of putative causal variants underlying human complex traits and diseases.

Author: Yuan K, Longchamps RJ, Pardiñas AF, Yu M, Chen TT, Lin SC, Chen Y, Lam M, Liu R, Xia Y, Guo Z, Shi W, Shen C, Daly MJ, Neale BM, Feng YA, Lin YF, Chen CY, O'Donovan MC, Ge T, and Huang H
Subjects: Humans, Computer Simulation, Gene Frequency, Genetic Predisposition to Disease, Genetic Variation, Genome, Human, Models, Genetic, Multifactorial Inheritance genetics, Schizophrenia genetics, White People genetics, East Asian People genetics, Chromosome Mapping methods, Genome-Wide Association Study methods, Linkage Disequilibrium, Polymorphism, Single Nucleotide, Quantitative Trait Loci
Abstract: Genome-wide association studies (GWAS) of human complex traits or diseases often implicate genetic loci that span hundreds or thousands of genetic variants, many of which have similar statistical significance. While statistical fine-mapping in individuals of European ancestry has made important discoveries, cross-population fine-mapping has the potential to improve power and resolution by capitalizing on the genomic diversity across ancestries. Here we present SuSiEx, an accurate and computationally efficient method for cross-population fine-mapping. SuSiEx integrates data from an arbitrary number of ancestries, explicitly models population-specific allele frequencies and linkage disequilibrium patterns, accounts for multiple causal variants in a genomic region and can be applied to GWAS summary statistics. We comprehensively assessed the performance of SuSiEx using simulations. We further showed that SuSiEx improves the fine-mapping of a range of quantitative traits available in both the UK Biobank and Taiwan Biobank, and improves the fine-mapping of schizophrenia-associated loci by integrating GWAS across East Asian and European ancestries., (© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.)
Published: 2024
Full Text: View/download PDF

125. A fine-scale genetic map of the Japanese population.

Author: Takayama J, Makino S, Funayama T, Ueki M, Narita A, Murakami K, Orui M, Ishikuro M, Obara T, Kuriyama S, Yamamoto M, and Tamiya G
Subjects: Humans, Alleles, Genetic Linkage, Genetics, Population, Genome, Human, Genome-Wide Association Study, Genotype, Haplotypes, Japan, Pedigree, Polymorphism, Single Nucleotide, Whole Genome Sequencing, Chromosome Mapping, East Asian People genetics, Linkage Disequilibrium, Recombination, Genetic
Abstract: Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate., (© 2024 The Authors. Clinical Genetics published by John Wiley & Sons Ltd.)
Published: 2024
Full Text: View/download PDF

126. Fragments derived from non-coding RNAs: how complex is genome regulation?

Author: Velázquez-Flores M and Ruiz Esparza-Garrido R
Subjects: Humans, Gene Expression Regulation, Nucleic Acid Conformation, MicroRNAs genetics, MicroRNAs metabolism, RNA, Untranslated genetics, Genome, Human
Abstract: The human genome is highly dynamic and only a small fraction of it codes for proteins, but most of the genome is transcribed, highlighting the importance of non-coding RNAs on cellular functions. In addition, it is now known the generation of non-coding RNA fragments under particular cellular conditions and their functions have revealed unexpected mechanisms of action, converging, in some cases, with the biogenic pathways and action machineries of microRNAs or Piwi-interacting RNAs. This led us to the question why the cell produces so many apparently redundant molecules to exert similar functions and regulate apparently convergent processes? However, non-coding RNAs fragments can also function similarly to aptamers, with secondary and tertiary conformations determining their functions. In the present work, it was reviewed and analyzed the current information about the non-coding RNAs fragments, describing their structure and biogenic pathways, with special emphasis on their cellular functions., Competing Interests: The authors declare no competing interests.
Published: 2024
Full Text: View/download PDF

127. Identification of DNase I hypersensitive sites in the human genome by multiple sequence descriptors.

Author: Jin YT, Tan Y, Gan ZH, Hao YD, Wang TY, Lin H, and Tang B
Subjects: Humans, Computational Biology methods, Algorithms, Regulatory Sequences, Nucleic Acid genetics, Deoxyribonuclease I metabolism, Deoxyribonuclease I genetics, Deoxyribonuclease I chemistry, Genome, Human, Chromatin genetics, Chromatin metabolism, Chromatin chemistry
Abstract: DNase I hypersensitive sites (DHSs) are chromatin regions highly sensitive to DNase I enzymes. Studying DHSs is crucial for understanding complex transcriptional regulation mechanisms and localizing cis-regulatory elements (CREs). Numerous studies have indicated that disease-related loci are often enriched in DHSs regions, underscoring the importance of identifying DHSs. Although wet experiments exist for DHSs identification, they are often labor-intensive. Therefore, there is a strong need to develop computational methods for this purpose. In this study, we used experimental data to construct a benchmark dataset. Seven feature extraction methods were employed to capture information about human DHSs. The F-score was applied to filter the features. By comparing the prediction performance of various classification algorithms through five-fold cross-validation, random forest was proposed to perform the final model construction. The model could produce an overall prediction accuracy of 0.859 with an AUC value of 0.837. We hope that this model can assist scholars conducting DNase research in identifying these sites., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

128. Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases.

Author: Uguen K, Michaud JL, and Génin E
Subjects: Humans, Genetics, Population methods, Genome, Human, Databases, Genetic, Genetic Variation, Microsatellite Repeats, High-Throughput Nucleotide Sequencing methods
Abstract: In this study, we explore the landscape of short tandem repeats (STRs) within the human genome through the lens of evolving technologies to detect genomic variations. STRs, which encompass approximately 3% of our genomic DNA, are crucial for understanding human genetic diversity, disease mechanisms, and evolutionary biology. The advent of high-throughput sequencing methods has revolutionized our ability to accurately map and analyze STRs, highlighting their significance in genetic disorders, forensic science, and population genetics. We review the current available methodologies for STR analysis, the challenges in interpreting STR variations across different populations, and the implications of STRs in medical genetics. Our findings underscore the urgent need for comprehensive STR databases that reflect the genetic diversity of global populations, facilitating the interpretation of STR data in clinical diagnostics, genetic research, and forensic applications. This work sets the stage for future studies aimed at harnessing STR variations to elucidate complex genetic traits and diseases, reinforcing the importance of integrating STRs into genetic research and clinical practice., (© 2024. The Author(s), under exclusive licence to European Society of Human Genetics.)
Published: 2024
Full Text: View/download PDF

129. 子宫内膜癌的分子病理分型及其研究进展.

Author: 任文彬, 崔向荣, and 张三元
Abstract: Endometrial carcinoma is a malignant tumor with significant molecular and histological heterogeneity. Different pathological types have different biological behaviors and histological characteristics. Similar to advanced cancers of other histological subtypes, adjuvant radiation therapy is usually used for early-stage endometrioid carcinoma and chemotherapy is usually used for serous endometrial carcinoma. Therefore, the correct classification of subtypes is the key to choosing appropriate adjuvant therapy regimen. At present, the clinical classification of endometrial cancer still refers to Bokhman classification and WHO pathological classification. With the comprehensive promotion of precision medicine and molecular diagnosis and treatment technology, the limitations of traditional endometrial cancer typing methods in the individualized treatment of endometrial cancer, prognosis assessment and related genetic disease screening have become increasingly prominent. An optimized classification method is urgently needed in clinic to provide the exact theoretical and practical basis. In 2013, The Cancer Genome Atlas (TCGA) research center in the United States confirmed the molecular classification of endometrial cancer by integrating genomic features. Compared with Bokhman′s and WHO′s classification, this classification achieved a stronger correlation with the prognosis of patients with endometrial cancer, which opened the curtain for the molecular diagnosis and treatment of endometrial Cancer. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

130. Three-Dimensional Simulation of Whole-Genome Structuring Through the Transition from Anaphase to Interphase.

Author: Fujishiro S and Sasai M
Subjects: Humans, Computer Simulation, Chromosomes, Human genetics, Mitosis genetics, Anaphase genetics, Interphase genetics, Genome, Human, Chromatin genetics, Chromatin metabolism
Abstract: In order to analyze the three-dimensional genome architecture, it is important to simulate how the genome is structured through the cell cycle progression. In this chapter, we present the usage of our computation codes for simulating how the human genome is formed as the cell transforms from anaphase to interphase. We do not use the global Hi-C data as an input into the genome simulation but represent all chromosomes as linear polymers annotated by the neighboring region contact index (NCI), which classifies the A/B type of each local chromatin region. The simulated mitotic chromosomes heterogeneously expand upon entry to the G1 phase, which induces phase separation of A and B chromatin regions, establishing chromosome territories, compartments, and lamina and nucleolus associations in the interphase nucleus. When the appropriate one-dimensional chromosomal annotation is possible, using the protocol of this chapter, one can quantitatively simulate the three-dimensional genome structure and dynamics of human cells of interest., (© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.)
Published: 2025
Full Text: View/download PDF

131. Meta-Analytic Operation of Threshold-independent Filtering (MOTiF) reveals sub-threshold genomic robustness in trisomy: The Jörmungandr Effect.

Author: Siegelmann R and Siegelmann HT
Subjects: Humans, Genome, Human, Genomics methods, Trisomy genetics
Abstract: Trisomy, a form of aneuploidy wherein the cell possesses an additional copy of a specific chromosome, exhibits a high correlation with cancer. Studies from across different hosts, cell-lines, and labs into the cellular effects induced by aneuploidy have conflicted, ranging from small, chaotic global changes to large instances of either overexpression or underexpression throughout the trisomic chromosome. We ascertained that conflicting findings may be correct but miss the overarching ground truth due to injudicious use of thresholds. To correct this deficiency, we introduce the Meta-analytic Operation of Threshold-independent Filtering (MOTiF) method, which begins by providing a panoramic view of all thresholds, transforms the data to eliminate the effects accounted for by known mechanisms, and then reconstructs an explanation of the mechanisms that underly the difference between the baseline and the uncharacterized effects observed. As a proof of concept, we applied MOTiF to human colonic epithelial cells, discovering a uniform decrease in gene expression levels throughout the genome, which while significant, is beneath most common thresholds. Using Hi-C data we identified the structural correlate, wherein the physical genomic architecture condenses, compactifying in a uniform, genome-wide manner. This effect, which we dub the Jörmungandr Effect, is likely a robustness mechanism counteracting the addition of a chromosome. We were able to break down the gene expression alterations into three overlapping mechanisms: the raw chromosome content, the genomic compartmentalization, and the global structural condensation. While further studies must be conducted to corroborate the hypothesized Jörmungandr Effect, MOTiF presents a useful meta-analytic tool in the realm of gene expression and beyond., Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.)
Published: 2024
Full Text: View/download PDF

132. Polymorphic pseudogenes in the human genome - a comprehensive assessment.

Author: Lopes-Marques M, Peixoto MJ, Cooper DN, Prata MJ, Azevedo L, and Castro LFC
Subjects: Humans, Alleles, Loss of Function Mutation, Pseudogenes genetics, Genome, Human, Polymorphism, Genetic
Abstract: Background: Over the past decade, variations of the coding portion of the human genome have become increasingly evident. In this study, we focus on polymorphic pseudogenes, a unique and relatively unexplored type of pseudogene whose inactivating mutations have not yet been fixed in the human genome at the global population level. Thus, polymorphic pseudogenes are characterized by the presence in the population of both coding alleles and non-coding alleles originating from Loss-of-Function (LoF) mutations. These alleles can be found both in heterozygosity and in homozygosity in different human populations and thus represent pseudogenes that have not yet been fixed in the population., Results: A methodical cross-population analysis of 232 polymorphic pseudogenes, including 35 new examples, reveals that human olfactory signalling, drug metabolism and immunity are among the systems most impacted by the variable presence of LoF variants at high frequencies. Within this dataset, a total of 179 genes presented polymorphic LoF variants in all analysed populations. Transcriptome and proteome analysis confirmed that although these genes may harbour LoF alleles, when the coding allele is present, the gene remains active and can play a functional role in various metabolic pathways, including drug/xenobiotic metabolism and immunity. The observation that many polymorphic pseudogenes are members of multigene families argues that genetic redundancy may play a key role in compensating for the inactivation of one paralogue., Conclusions: The distribution, expression and integration of cellular/biological networks in relation to human polymorphic pseudogenes, provide novel insights into the architecture of the human genome and the dynamics of gene gain and loss with likely functional impact., Competing Interests: Declarations Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Competing interests The authors declare no competing interests., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

133. Methods and applications of genome-wide profiling of DNA damage and rare mutations.

Author: Pfeifer GP and Jin SG
Subjects: Humans, Animals, Genome-Wide Association Study methods, Genome, Human, DNA Damage genetics, Mutation
Abstract: DNA damage is a threat to genome integrity and can be a cause of many human diseases, owing to either changes in the chemical structure of DNA or conversion of the damage into a mutation, that is, a permanent change in DNA sequence. Determining the exact positions of DNA damage and ensuing mutations in the genome are important for identifying mechanisms of disease aetiology when characteristic mutations are prevalent and probably causative in a particular disease. However, this approach is challenging particularly when levels of DNA damage are low, for example, as a result of chronic exposure to environmental agents or certain endogenous processes, such as the generation of reactive oxygen species. Over the past few years, a comprehensive toolbox of genome-wide methods has been developed for the detection of DNA damage and rare mutations at single-nucleotide resolution in mammalian cells. Here, we review and compare these methods, describe their current applications and discuss future research questions that can now be addressed., Competing Interests: Competing interests The authors declare no competing interests., (© 2024. Springer Nature Limited.)
Published: 2024
Full Text: View/download PDF

134. Regulatory transposable elements in the encyclopedia of DNA elements.

Author: Du AY, Chobirko JD, Zhuo X, Feschotte C, and Wang T
Subjects: Humans, Animals, Mice, Binding Sites genetics, Genome, Human, Regulatory Sequences, Nucleic Acid genetics, Evolution, Molecular, DNA Transposable Elements genetics, Transcription Factors metabolism, Transcription Factors genetics
Abstract: Transposable elements (TEs) comprise ~50% of our genome, but knowledge of how TEs affect genome evolution remains incomplete. Leveraging ENCODE4 data, we provide the most comprehensive study to date of TE contributions to the regulatory genome. We find 236,181 (~25%) human candidate cis-regulatory elements (cCREs) are TE-derived, with over 90% lineage-specific since the human-mouse split, accounting for 8-36% of lineage-specific cCREs. Except for SINEs, cCRE-associated transcription factor (TF) motifs in TEs are derived from ancestral TE sequence more than expected by chance. We show that TEs may adopt similar regulatory activities of elements near their integration site. Since human-mouse divergence, TEs have contributed 3-56% of TF binding site turnover events across 30 examined TFs. Finally, TE-derived cCREs are similar to non-TE cCREs in terms of MPRA activity and GWAS variant enrichment. Overall, our results substantiate the notion that TEs have played an important role in shaping the human regulatory genome., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

135. Genome Tunisia Project: paving the way for precision medicine in North Africa.

Author: Hamdi Y, Trabelsi M, Ghedira K, Boujemaa M, Ben Ayed I, Charfeddine C, Souissi A, Rejeb I, Kammoun Rebai W, Hkimi C, Neifar F, Jandoubi N, Mkaouar R, Chaouch M, Bennour A, Kamoun S, Chaker Masmoudi H, Abid N, Mezghani Khemakhem M, Masmoudi S, Saad A, BenJemaa L, BenKahla A, Boubaker S, Mrad R, Kamoun H, Abdelhak S, Gribaa M, Belguith N, Kharrat N, Hmida D, and Rebai A
Subjects: Humans, Tunisia, Genomics methods, Africa, Northern, Precision Medicine methods, Genome, Human
Abstract: Background: Key discoveries and innovations in the field of human genetics have led to the foundation of molecular and personalized medicine. Here, we present the Genome Tunisia Project, a two-phased initiative (2022-2035) which aims to deliver the reference sequence of the Tunisian Genome and to support the implementation of personalized medicine in Tunisia, a North African country that represents a central hub of population admixture and human migration between African, European, and Asian populations. The main goal of this initiative is to develop a healthcare system capable of incorporating omics data for use in routine medical practice, enabling medical doctors to better prevent, diagnose, and treat patients., Methods: A multidisciplinary partnership involving Tunisian experts from different institutions has come to discern all requirements that would be of high priority to fulfill the project's goals. One of the most urgent priorities is to determine the reference sequence of the Tunisian Genome. In addition, extensive situation analysis and revision of the education programs, community awareness, appropriate infrastructure including sequencing platforms and biobanking, as well as ethical and regulatory frameworks, have been undertaken towards building sufficient capacity to integrate personalized medicine into the Tunisian healthcare system., Results: In the framework of this project, an ecosystem with all engaged stakeholders has been implemented including healthcare providers, clinicians, researchers, pharmacists, bioinformaticians, industry, policymakers, and advocacy groups. This initiative will also help to reinforce research and innovation capacities in the field of genomics and to strengthen discoverability in the health sector., Conclusions: Genome Tunisia is the first initiative in North Africa that seeks to demonstrate the major impact that can be achieved by Human Genome Projects in low- and middle-income countries to strengthen research and to improve disease management and treatment outcomes, thereby reducing the social and economic burden on healthcare systems. Sharing this experience within the African scientific community is a chance to turn a major challenge into an opportunity for dissemination and outreach. Additional efforts are now being made to advance personalized medicine in patient care by educating consumers and providers, accelerating research and innovation, and supporting necessary changes in policy and regulation., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

136. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation.

Author: Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, and Fernández-Breis JT
Subjects: Humans, Transcription Factors metabolism, Transcription Factors genetics, Genome, Human, Gene Ontology, Enhancer Elements, Genetic, Gene Expression Regulation, Databases, Genetic
Abstract: Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2024
Full Text: View/download PDF

137. Differential impact of quiescent non-coding loci on chromatin entropy.

Author: Wu P, Yao M, and Wang W
Subjects: Humans, Machine Learning, Epigenesis, Genetic, Gene Regulatory Networks, CRISPR-Cas Systems, Genetic Loci, Chromatin metabolism, Chromatin genetics, Entropy, Genome, Human
Abstract: Non-coding regions of the human genome are important for functional regulations, but their mechanisms remain elusive. We used machine learning to guide a CRISPR screening on hubs (i.e. non-coding loci forming many 3D contacts) and significantly increased the discovery rate of hubs essential for cell growth. We found no clear genetic or epigenetic differences between essential and nonessential hubs, but we observed that some neighboring hubs in the linear genome have distinct spatial contacts and opposite effects on cell growth. One such pair in an epigenetically quiescent region showed different impacts on gene expression, chromatin accessibility and chromatin organization. We also found that deleting the essential hub altered the genetic network activity and increased the entropy of chromatin accessibility, more severe than that caused by deletion of the nonessential hub, suggesting that they are critical for maintaining an ordered chromatin structure. Our study reveals new insights into the system-level roles of non-coding regions in the human genome., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2024
Full Text: View/download PDF

138. Interpretable deep residual network uncovers nucleosome positioning and associated features.

Author: Masoudi-Sobhanzadeh Y, Li S, Peng Y, and Panchenko AR
Subjects: Humans, Histones metabolism, Histones genetics, DNA chemistry, DNA genetics, Genome, Human, Deep Learning, Animals, Nucleosomes metabolism, Nucleosomes chemistry, Nucleosomes genetics
Abstract: Nucleosomes represent elementary building units of eukaryotic chromosomes and consist of DNA wrapped around a histone octamer flanked by linker DNA segments. Nucleosomes are central in epigenetic pathways and their genomic positioning is associated with regulation of gene expression, DNA replication, DNA methylation and DNA repair, among other functions. Building on prior discoveries that DNA sequences noticeably affect nucleosome positioning, our objective is to identify nucleosome positions and related features across entire genome. Here, we introduce an interpretable framework based on the concepts of deep residual networks (NuPoSe). Trained on high-coverage human experimental MNase-seq data, NuPoSe is able to learn sequence and structural patterns associated with nucleosome organization in human genome. NuPoSe can be also applied to unseen data from different organisms and cell types. Our findings point to 43 informative features, most of them constitute tri-nucleotides, di-nucleotides and one tetra-nucleotide. Most features are significantly associated with the nucleosomal structural characteristics, namely, periodicity of nucleosomal DNA and its location with respect to a histone octamer. Importantly, we show that features derived from the 27 bp linker DNA flanking nucleosomes contribute up to 10% to the quality of the prediction model. This, along with the comprehensive training sets, deep-learning architecture, and feature selection method, may contribute to the NuPoSe's 80-89% classification accuracy on different independent datasets., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2024
Full Text: View/download PDF

139. G-quadruplexes as pivotal components of cis-regulatory elements in the human genome.

Author: Zhang R, Wang Y, Wang C, Sun X, and Mergny JL
Subjects: Humans, Regulatory Sequences, Nucleic Acid genetics, Promoter Regions, Genetic, Regulatory Elements, Transcriptional genetics, G-Quadruplexes, Genome, Human
Abstract: Background: Cis-regulatory elements (CREs) are crucial for regulating gene expression, and G-quadruplexes (G4s), as prototypal non-canonical DNA structures, may play a role in this regulation. However, the relationship between G4s and CREs, especially with non-promoter-like functional elements, requires further systematic investigation. We aimed to investigate the associations between G4s and human cCREs (candidate CREs) inferred from the Encyclopedia of DNA Elements (ENCODE) data., Results: We found that G4s are prominently enriched in most types of cCREs, especially those with promoter-like signatures (PLS). The co-occurrence of CTCF signals with H3K4me3 or H3K27ac signals strengthens the association between cCREs and G4s. Genetic variants in G4s, particularly within their G-runs, exhibit higher regulatory potential and deleterious effects compared to cCREs. The G-runs within G4s near transcriptional start sites (TSSs) are more evolutionarily constrained compared to G-runs in cCREs, while those far from the TSS are relatively less conserved. The presence of G4s is often linked to a more favorable local chromatin environment for the activation and execution of regulatory function of cCREs, potentially attributable to the formation of G4 secondary structures. Finally, we discovered that G4-associated cCREs exhibit widespread activation in a variety of cancers., Conclusions: Our study suggests that G4s are integral components of human cis-regulatory elements, extending beyond their potential role in promoters. The G4 primary sequences are associated with the localization of CREs, while the G4 structures are linked to the activation of these elements. Therefore, we propose defining G4s as pivotal regulatory elements in the human genome., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

140. Local Ancestry Inference Based on Population-Specific Single-Nucleotide Polymorphisms-A Study of Admixed Populations in the 1000 Genomes Project.

Author: Fu H and Shi G
Subjects: Humans, Asian People genetics, Black People genetics, Haplotypes, Human Genome Project, White People genetics, Indigenous Peoples genetics, Genetics, Population methods, Genome, Human, Polymorphism, Single Nucleotide
Abstract: Human populations have interacted throughout history, and a considerable portion of modern human populations show evidence of admixture. Local ancestry inference (LAI) is focused on detecting the genetic ancestry of chromosomal segments in admixed individuals and has wide applications. In this work, we proposed a new LAI method based on population-specific single-nucleotide polymorphisms (SNPs) and applied it in the analysis of admixed populations in the 1000 Genomes Project (1KGP). Based on population-specific SNPs in a sliding window, we computed local ancestry information vectors, which are moment estimators of local ancestral proportions, for two haplotypes of an admixed individual and inferred the local ancestral origins. Then we used African (AFR), East Asian (EAS), European (EUR) and South Asian (SAS) populations from the 1KGP and indigenous American (AMR) populations from the Human Genome Diversity Project (HGDP) as reference populations and conducted the proposed LAI analysis on African American populations and American populations in the 1KGP. The results were compared with those obtained by RFMix, G-Nomix and FLARE. We demonstrated that the existence of alleles in a chromosomal region that are specific to a particular reference population and the absence of alleles specific to the other reference populations provide reasonable evidence for determining the ancestral origin of the region. Contemporary AFR, AMR and EUR populations approximate ancestral populations of the admixed populations well, and the results from RFMix, G-Nomix and FLARE largely agree with those from the Ancestral Spectrum Analyzer (ASA), in which the proposed method was implemented. When admixtures are ancient and contemporary reference populations do not satisfactorily approximate ancestral populations, the performances of RFMix, G-Nomix and FLARE deteriorate with increased error rates and fragmented chromosomal segments. In contrast, our method provides fair results.
Published: 2024
Full Text: View/download PDF

141. Novel crossover and recombination hotspots massively spread across primate genomes.

Author: Ohadi M, Arabfard M, Khamse S, Alizadeh S, Vafadar S, Bayat H, Tajeddin N, Maddi AMA, Delbari A, and Khorram Khorshid HR
Subjects: Animals, Humans, Genome, Genome, Human, Mice, Crossing Over, Genetic, Primates genetics, Recombination, Genetic
Abstract: Background: The recombination landscape and subsequent natural selection have vast consequences forevolution and speciation. However, most of the crossover and recombination hotspots are yet to be discovered. We previously reported the relevance of C and G trinucleotide two-repeat units (CG-TTUs) in crossovers and recombination., Methods: On a genome-wide scale, here we mapped all combinations of A and T trinucleotide two-repeat units (AT-TTUs) in human, consisting of AATAAT, ATAATA, ATTATT, TTATTA, TATTAT, and TAATAA. We also compared a number of the colonies formed by the AT-TTUs (distance between consecutive AT-TTUs < 500 bp) in several other primates and mouse., Results: We found that the majority of the AT-TTUs (> 96%) resided in approximately 1.4 million colonies, spread throughout the human genome. In comparison to the CG-TTU colonies, the AT-TTU colonies were significantly more abundant and larger in size. Pure units and overlapping units of the pure units were readily detectable in the same colonies, signifying that the units were the sites of unequal crossover. We discovered dynamic sharedness of several of the colonies across the primate species studied, which mainly reached maximum complexity and size in human., Conclusions: We report novel crossover and recombination hotspots of the finest molecular resolution, massively spread and shared across the genomes of human and several other primates. With respect to crossovers and recombination, these genomes are far more dynamic than previously envisioned., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

142. A dual DNA-binding conjugate that selectively recognizes G-quadruplex structures.

Author: Ooga M, Sahayasheela VJ, Hirose Y, Sasaki D, Hashiya K, Bando T, and Sugiyama H
Subjects: Humans, Genome, Human, Binding Sites, G-Quadruplexes, DNA chemistry
Abstract: G-quadruplex (G4) structures play roles in various biological processes, but the challenge lies in specific targeting. To address this, we synthesized a conjugate capable of recognizing the G4 structure and its proximal duplex. Our conjugate can enable recognition of specific G4s in the human genome to understand and target those structures.
Published: 2024
Full Text: View/download PDF

143. VolcanoSV enables accurate and robust structural variant calling in diploid genomes from single-molecule long read sequencing.

Author: Luo C, Liu YH, and Zhou XM
Subjects: Humans, Genomics methods, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Software, Haplotypes, Diploidy, Genome, Human, Genomic Structural Variation, Polymorphism, Single Nucleotide
Abstract: Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

144. Evidence for widespread translation of 5' untranslated regions.

Author: Rodriguez JM, Abascal F, Cerdán-Vélez D, Gómez LM, Vázquez J, and Tress ML
Subjects: Humans, Codon, Initiator genetics, Base Composition, Genome, Human, Animals, Open Reading Frames genetics, Conserved Sequence, Peptides genetics, Peptides metabolism, 5' Untranslated Regions, Protein Biosynthesis
Abstract: Ribosome profiling experiments support the translation of a range of novel human open reading frames. By contrast, most peptides from large-scale proteomics experiments derive from just one source, 5' untranslated regions. Across the human genome we find evidence for 192 translated upstream regions, most of which would produce protein isoforms with extended N-terminal ends. Almost all of these N-terminal extensions are from highly abundant genes, which suggests that the novel regions we detect are just the tip of the iceberg. These upstream regions have characteristics that are not typical of coding exons. Their GC-content is remarkably high, even higher than 5' regions in other genes, and a large majority have non-canonical start codons. Although some novel upstream regions have cross-species conservation - five have orthologues in invertebrates for example - the reading frames of two thirds are not conserved beyond simians. These non-conserved regions also have no evidence of purifying selection, which suggests that much of this translation is not functional. In addition, non-conserved upstream regions have significantly more peptides in cancer cell lines than would be expected, a strong indication that an aberrant or noisy translation initiation process may play an important role in translation from upstream regions., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2024
Full Text: View/download PDF

145. Enhancing recognition and interpretation of functional phenotypic sequences through fine-tuning pre-trained genomic models.

Author: Du D, Zhong F, and Liu L
Subjects: Humans, Models, Genetic, Endogenous Retroviruses genetics, Deep Learning, Genotype, Phenotype, Genomics methods, Genome, Human
Abstract: Background: Decoding human genomic sequences requires comprehensive analysis of DNA sequence functionality. Through computational and experimental approaches, researchers have studied the genotype-phenotype relationship and generate important datasets that help unravel complicated genetic blueprints. Thus, the recently developed artificial intelligence methods can be used to interpret the functions of those DNA sequences., Methods: This study explores the use of deep learning, particularly pre-trained genomic models like DNA_bert_6 and human_gpt2-v1, in interpreting and representing human genome sequences. Initially, we meticulously constructed multiple datasets linking genotypes and phenotypes to fine-tune those models for precise DNA sequence classification. Additionally, we evaluate the influence of sequence length on classification results and analyze the impact of feature extraction in the hidden layers of our model using the HERV dataset. To enhance our understanding of phenotype-specific patterns recognized by the model, we perform enrichment, pathogenicity and conservation analyzes of specific motifs in the human endogenous retrovirus (HERV) sequence with high average local representation weight (ALRW) scores., Results: We have constructed multiple genotype-phenotype datasets displaying commendable classification performance in comparison with random genomic sequences, particularly in the HERV dataset, which achieved binary and multi-classification accuracies and F1 values exceeding 0.935 and 0.888, respectively. Notably, the fine-tuning of the HERV dataset not only improved our ability to identify and distinguish diverse information types within DNA sequences but also successfully identified specific motifs associated with neurological disorders and cancers in regions with high ALRW scores. Subsequent analysis of these motifs shed light on the adaptive responses of species to environmental pressures and their co-evolution with pathogens., Conclusions: These findings highlight the potential of pre-trained genomic models in learning DNA sequence representations, particularly when utilizing the HERV dataset, and provide valuable insights for future research endeavors. This study represents an innovative strategy that combines pre-trained genomic model representations with classical methods for analyzing the functionality of genome sequences, thereby promoting cross-fertilization between genomics and artificial intelligence., (© 2024. The Author(s).)
Published: 2024
Full Text: View/download PDF

146. CLEMENT: genomic decomposition and reconstruction of non-tumor subclones.

Author: Chung YS, Kang S, Kim J, Lee S, and Kim S
Subjects: Humans, Neoplasms genetics, Mutation, Genome, Human, Clone Cells, Algorithms, Genomics methods
Abstract: Genome-level clonal decomposition of a single specimen has been widely studied; however, it is mostly limited to cancer research. In this study, we developed a new algorithm CLEMENT, which conducts accurate decomposition and reconstruction of multiple subclones in genome sequencing of non-tumor (normal) samples. CLEMENT employs the Expectation-Maximization (EM) algorithm with optimization strategies specific to non-tumor subclones, including false variant call identification, non-disparate clone fuzzy clustering, and clonal allele fraction confinement. In the simulation and in vitro cell line mixture data, CLEMENT outperformed current cancer decomposition algorithms in estimating the number of clones (root-mean-square-error = 0.58-0.78 versus 1.43-3.34) and in the variant-clone membership agreement (∼85.5% versus 70.1-76.7%). Additional testing on human multi-clonal normal tissue sequencing confirmed the accurate identification of subclones that originated from different cell types. Clone-level analysis, including mutational burden and signatures, provided a new understanding of normal-tissue composition. We expect that CLEMENT will serve as a crucial tool in the currently emerging field of non-tumor genome analysis., (© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2024
Full Text: View/download PDF

147. The cytidine deaminase APOBEC3C has unique sequence and genome feature preferences.

Author: Brown GW
Subjects: Humans, Genome, Human, DNA Replication, Proteins genetics, Proteins metabolism, Mutagenesis, Saccharomyces cerevisiae genetics, Minor Histocompatibility Antigens, Cytidine Deaminase genetics, Cytidine Deaminase metabolism, Mutation
Abstract: APOBEC proteins are cytidine deaminases that restrict the replication of viruses and transposable elements. Several members of the APOBEC3 family, APOBEC3A, APOBEC3B, and APOBEC3H-I, can access the nucleus and cause what is thought to be indiscriminate deamination of the genome, resulting in mutagenesis and genome instability. Although APOBEC3C is also present in the nucleus, the full scope of its deamination target preferences is unknown. By expressing human APOBEC3C in a yeast model system, I have defined the APOBEC3C mutation signature, as well as the preferred genome features of APOBEC3C targets. The APOBEC3C mutation signature is distinct from those of the known cancer genome mutators APOBEC3A and APOBEC3B. APOBEC3C produces DNA strand-coordinated mutation clusters, and APOBEC3C mutations are enriched near the transcription start sites of active genes. Surprisingly, APOBEC3C lacks the bias for the lagging strand of DNA replication that is seen for APOBEC3A and APOBEC3B. The unique preferences of APOBEC3C constitute a mutation profile that will be useful in defining sites of APOBEC3C mutagenesis in human genomes., Competing Interests: Conflicts of interest The author(s) declare no conflicts of interest., (© The Author(s) 2024. Published by Oxford University Press on behalf of The Genetics Society of America.)
Published: 2024
Full Text: View/download PDF

148. Genetic Signatures of Positive Selection in Human Populations Adapted to High Altitude in Papua New Guinea.

Author: González-Buenfil R, Vieyra-Sánchez S, Quinto-Cortés CD, Oppenheimer SJ, Pomat W, Laman M, Cervantes-Hernández MC, Barberena-Jonas C, Auckland K, Allen A, Allen S, Phipps ME, Huerta-Sanchez E, Ioannidis AG, Mentzer AJ, and Moreno-Estrada A
Subjects: Humans, Papua New Guinea, Adaptation, Physiological genetics, Genome, Human, Altitude Sickness genetics, Selection, Genetic, Altitude
Abstract: Papua New Guinea (PNG) hosts distinct environments mainly represented by the ecoregions of the Highlands and Lowlands that display increased altitude and a predominance of pathogens, respectively. Since its initial peopling approximately 50,000 years ago, inhabitants of these ecoregions might have differentially adapted to the environmental pressures exerted by each of them. However, the genetic basis of adaptation in populations from these areas remains understudied. Here, we investigated signals of positive selection in 62 highlanders and 43 lowlanders across 14 locations in the main island of PNG using whole-genome genotype data from the Oceanian Genome Variation Project (OGVP) and searched for signals of positive selection through population differentiation and haplotype-based selection scans. Additionally, we performed archaic ancestry estimation to detect selection signals in highlanders within introgressed regions of the genome. Among highland populations we identified candidate genes representing known biomarkers for mountain sickness (SAA4, SAA1, PRDX1, LDHA) as well as candidate genes of the Notch signaling pathway (PSEN1, NUMB, RBPJ, MAML3), a novel proposed pathway for high altitude adaptation in multiple organisms. We also identified candidate genes involved in oxidative stress, inflammation, and angiogenesis, processes inducible by hypoxia, as well as in components of the eye lens and the immune response. In contrast, candidate genes in the lowlands are mainly related to the immune response (HLA-DQB1, HLA-DQA2, TAAR6, TAAR9, TAAR8, RNASE4, RNASE6, ANG). Moreover, we find two candidate regions to be also enriched with archaic introgressed segments, suggesting that archaic admixture has played a role in the local adaptation of PNG populations., (© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.)
Published: 2024
Full Text: View/download PDF

149. The Structure of Simple Satellite Variation in the Human Genome and Its Correlation With Centromere Ancestry.

Author: Said I, Barbash DA, and Clark AG
Subjects: Humans, Evolution, Molecular, Genetic Variation, Centromere genetics, Genome, Human, DNA, Satellite genetics
Abstract: Although repetitive DNA forms much of the human genome, its study is challenging due to limitations in assembly and alignment of repetitive short-reads. We have deployed k-Seek, software that detects tandem repeats embedded in single reads, on 2,504 human genomes from the 1,000 Genomes Project to quantify the variation and abundance of simple satellites (repeat units <20 bp). We find that the ancestral monomer of Human Satellite 3 makes up the largest portion of simple satellite content in humans (mean of ∼8 Mb). We discovered ∼50,000 rare tandem repeats that are not detected in the T2T-CHM13v2.0 assembly, including undescribed variants of telomericand pericentromeric repeats. We find broad homogeneity of the most abundant repeats across populations, except for AG-rich repeats which are more abundant in African individuals. We also find cliques of highly similar AG- and AT-rich satellites that are interspersed and form higher-order structures that covary in copy number across individuals, likely through concerted amplification via unequal exchange. Finally, we use pericentromeric polymorphisms to estimate centromeric genetic relatedness between individuals and find a strong predictive relationship between centromeric lineages and pericentromeric simple satellite abundances. In particular, ancestral monomers of Human Satellite 2 and Human Satellite 3 abundances correlate with clusters of centromeric ancestry on chromosome 16 and chromosome 9, with some clusters structured by population. These results provide new descriptions of the population dynamics that underlie the evolution of simple satellites in humans., (© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.)
Published: 2024
Full Text: View/download PDF

150. Genome-wide detection of somatic mosaicism at short tandem repeats.

Author: Sehgal A, Ziaei Jam H, Shen A, and Gymrek M
Subjects: Humans, Alleles, Software, Mosaicism, Microsatellite Repeats, Genome, Human, High-Throughput Nucleotide Sequencing methods
Abstract: Motivation: Somatic mosaicism has been implicated in several developmental disorders, cancers, and other diseases. Short tandem repeats (STRs) consist of repeated sequences of 1-6 bp and comprise >1 million loci in the human genome. Somatic mosaicism at STRs is known to play a key role in the pathogenicity of loci implicated in repeat expansion disorders and is highly prevalent in cancers exhibiting microsatellite instability. While a variety of tools have been developed to genotype germline variation at STRs, a method for systematically identifying mosaic STRs is lacking., Results: We introduce prancSTR, a novel method for detecting mosaic STRs from individual high-throughput sequencing datasets. prancSTR is designed to detect loci characterized by a single high-frequency mosaic allele, but can also detect loci with multiple mosaic alleles. Unlike many existing mosaicism detection methods for other variant types, prancSTR does not require a matched control sample as input. We show that prancSTR accurately identifies mosaic STRs in simulated data, demonstrate its feasibility by identifying candidate mosaic STRs in Illumina whole genome sequencing data derived from lymphoblastoid cell lines for individuals sequenced by the 1000 Genomes Project, and evaluate the use of prancSTR on Element and PacBio data. In addition to prancSTR, we present simTR, a novel simulation framework which simulates raw sequencing reads with realistic error profiles at STRs., Availability and Implementation: prancSTR and simTR are freely available at https://github.com/gymrek-lab/trtools. Detailed documentation is available at https://trtools.readthedocs.io/., (© The Author(s) 2024. Published by Oxford University Press.)
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

53,368 results on '"Genome, Human"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources