116 results on '"Makova KD"'
Search Results
2. Genome-Wide Analysis of Common Fragile Sites: What Features determine chromosomal instability in the human genome?
- Author
-
Fungtammasan, A, Walsh, E, Chiaromonte, Francesca, Eckert, Ka, and Makova, Kd
- Published
- 2012
3. A matter of life and death: how microsatellites emerge in and vanish from the human genome
- Author
-
Kelkar, Yd, Eckert, Ka, Chiaromonte, Francesca, and Makova, Kd
- Published
- 2011
4. Ride the wavelet: a multi-scale analysis of genomic context flanking small insertions and deletions
- Author
-
Kvikstad, Em, Chiaromonte, Francesca, and Makova, Kd
- Published
- 2009
5. The genome-wide determinants of microsatellite evolution
- Author
-
Kelkar, Y, Tyekucheva, S, Chiaromonte, Francesca, and Makova, Kd
- Published
- 2008
6. A macaque’s-eye view of human insertions and deletions: differences in mechanisms
- Author
-
Kvikstad, Em, Tyekucheva, S, Chiaromonte, Francesca, and Makova, Kd
- Published
- 2007
7. Insertions and deletions are male-biased too: a whole-genome analysis in rodents
- Author
-
Makova, Kd, Yang, S, and Chiaromonte, Francesca
- Published
- 2004
8. Complete sequencing of ape genomes.
- Author
-
Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, Antipov D, Pickett BD, Safonova Y, Montinaro F, Luo Y, Malukiewicz J, Storer JM, Lin J, Sequeira AN, Mangan RJ, Hickey G, Anez GM, Balachandran P, Bankevich A, Beck CR, Biddanda A, Borchers M, Bouffard GG, Brannan E, Brooks SY, Carbone L, Carrel L, Chan AP, Crawford J, Diekhans M, Engelbrecht E, Feschotte C, Formenti G, Garcia GH, de Gennaro L, Gilbert D, Green RE, Guarracino A, Gupta I, Haddad D, Han J, Harris RS, Hartley GA, Harvey WT, Hiller M, Hoekzema K, Houck ML, Jeong H, Kamali K, Kellis M, Kille B, Lee C, Lee Y, Lees W, Lewis AP, Li Q, Loftus M, Loh YHE, Loucks H, Ma J, Mao Y, Martinez JFI, Masterson P, McCoy RC, McGrath B, McKinney S, Meyer BS, Miga KH, Mohanty SK, Munson KM, Pal K, Pennell M, Pevzner PA, Porubsky D, Potapova T, Ringeling FR, Roha JL, Ryder OA, Sacco S, Saha S, Sasaki T, Schatz MC, Schork NJ, Shanks C, Smeds L, Son DR, Steiner C, Sweeten AP, Tassia MG, Thibaud-Nissen F, Torres-González E, Trivedi M, Wei W, Wertz J, Yang M, Zhang P, Zhang S, Zhang Y, Zhang Z, Zhao SA, Zhu Y, Jarvis ED, Gerton JL, Rivas-González I, Paten B, Szpiech ZA, Huber CD, Lenz TL, Konkel MK, Yi SV, Canzar S, Watson CT, Sudmant PH, Molloy E, Garrison E, Lowe CB, Ventura M, O'Neill RJ, Koren S, Makova KD, Phillippy AM, and Eichler EE
- Abstract
We present haplotype-resolved reference genomes and comparative analyses of six ape species, namely: chimpanzee, bonobo, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. We achieve chromosome-level contiguity with unparalleled sequence accuracy (<1 error in 500,000 base pairs), completely sequencing 215 gapless chromosomes telomere-to-telomere. We resolve challenging regions, such as the major histocompatibility complex and immunoglobulin loci, providing more in-depth evolutionary insights. Comparative analyses, including human, allow us to investigate the evolution and diversity of regions previously uncharacterized or incompletely studied without bias from mapping to the human reference. This includes newly minted gene families within lineage-specific segmental duplications, centromeric DNA, acrocentric chromosomes, and subterminal heterochromatin. This resource should serve as a definitive baseline for all future evolutionary studies of humans and our closest living ape relatives., Competing Interests: COMPETING INTERESTS E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. C.T.W. is a co-founder/CSO of Clareo Biosciences, Inc. W.L. is a co-founder/CIO of Clareo Biosciences, Inc. The other authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
9. The complete sequence and comparative analysis of ape sex chromosomes.
- Author
-
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, and Phillippy AM
- Subjects
- Animals, Female, Male, Gorilla gorilla genetics, Hylobatidae genetics, Pan paniscus genetics, Pan troglodytes genetics, Phylogeny, Pongo abelii genetics, Pongo pygmaeus genetics, Telomere genetics, Evolution, Molecular, DNA Copy Number Variations genetics, Humans, Endangered Species, Reference Standards, Hominidae genetics, Hominidae classification, X Chromosome genetics, Y Chromosome genetics
- Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility
1 . The X chromosome is vital for reproduction and cognition2 . Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species., (© 2024. The Author(s).)- Published
- 2024
- Full Text
- View/download PDF
10. Methylation profiles at birth linked to early childhood obesity.
- Author
-
Lariviere D, Craig SJC, Paul IM, Hohman EE, Savage JS, Wright RO, Chiaromonte F, Makova KD, and Reimherr ML
- Subjects
- Humans, Female, Pregnancy, Male, Infant, Newborn, Infant, Fetal Blood metabolism, Placenta metabolism, Body Mass Index, Epigenesis, Genetic, Adult, DNA Methylation, Pediatric Obesity genetics
- Abstract
Childhood obesity represents a significant global health concern and identifying its risk factors is crucial for developing intervention programs. Many "omics" factors associated with the risk of developing obesity have been identified, including genomic, microbiomic, and epigenomic factors. Here, using a sample of 48 infants, we investigated how the methylation profiles in cord blood and placenta at birth were associated with weight outcomes (specifically, conditional weight gain, body mass index, and weight-for-length ratio) at age six months. We characterized genome-wide DNA methylation profiles using the Illumina Infinium MethylationEpic chip, and incorporated information on child and maternal health, and various environmental factors into the analysis. We used regression analysis to identify genes with methylation profiles most predictive of infant weight outcomes, finding a total of 23 relevant genes in cord blood and 10 in placenta. Notably, in cord blood, the methylation profiles of three genes (PLIN4, UBE2F, and PPP1R16B) were associated with all three weight outcomes, which are also associated with weight outcomes in an independent cohort suggesting a strong relationship with weight trajectories in the first six months after birth. Additionally, we developed a Methylation Risk Score (MRS) that could be used to identify children most at risk for developing childhood obesity. While many of the genes identified by our analysis have been associated with weight-related traits (e.g., glucose metabolism, BMI, or hip-to-waist ratio) in previous genome-wide association and variant studies, our analysis implicated several others, whose involvement in the obesity phenotype should be evaluated in future functional investigations.
- Published
- 2024
- Full Text
- View/download PDF
11. Transcript Isoform Diversity of Y Chromosome Ampliconic Genes of Great Apes Uncovered Using Long Reads and Telomere-to-Telomere Reference Genome Assemblies.
- Author
-
Greshnova A, Pál K, Martinez JFI, Canzar S, and Makova KD
- Abstract
Y chromosomes of great apes harbor A mpliconic G enes (YAGs)-multi-copy gene families ( BPY2 , CDY , DAZ , HSFY , PRY , RBMY , TSPY , VCY , and XKRY ) that encode proteins important for spermatogenesis. Previous work assembled YAG transcripts based on their targeted sequencing but not using reference genome assemblies, potentially resulting in an incomplete transcript repertoire. Here we used the recently produced gapless telomere-to-telomere (T2T) Y chromosome assemblies of great ape species (bonobo, chimpanzee, human, gorilla, Bornean orangutan, and Sumatran orangutan) and analyzed RNA data from whole-testis samples for the same species. We generated hybrid transcriptome assemblies by combining targeted long reads (Pacific Biosciences), untargeted long reads (Pacific Biosciences) and untargeted short reads (Illumina)and mapping them to the T2T reference genomes. Compared to the results from the reference-free approach, average transcript length was more than two times higher, and the total number of transcripts decreased three times, improving the quality of the assembled transcriptome. The reference-based transcriptome assemblies allowed us to differentiate transcripts originating from different Y chromosome gene copies and from their non-Y chromosome homologs. We identified two sources of transcriptome diversity-alternative splicing and gene duplication with subsequent diversification of gene copies. For each gene family, we detected transcribed pseudogenes along with protein-coding gene copies. We revealed previously unannotated gene copies of YAGs as compared to currently available NCBI annotations, as well as novel isoforms for annotated gene copies. This analysis paves the way for better understanding Y chromosome gene functions, which is important given their role in spermatogenesis.
- Published
- 2024
- Full Text
- View/download PDF
12. In vivo detection of DNA secondary structures using permanganate/S1 footprinting with direct adapter ligation and sequencing (PDAL-Seq).
- Author
-
Lahnsteiner A, Craig SJC, Kamali K, Weissensteiner B, McGrath B, Risch A, and Makova KD
- Subjects
- Oxides, Manganese Compounds, Oligonucleotides, DNA chemistry, G-Quadruplexes
- Abstract
DNA secondary structures are essential elements of the genomic landscape, playing a critical role in regulating various cellular processes. These structures refer to G-quadruplexes, cruciforms, Z-DNA or H-DNA structures, amongst others (collectively called 'non-B DNA'), which DNA molecules can adopt beyond the B conformation. DNA secondary structures have significant biological roles, and their landscape is dynamic and can rearrange due to various factors, including changes in cellular conditions, temperature, and DNA-binding proteins. Understanding this dynamic nature is crucial for unraveling their functions in cellular processes. Detecting DNA secondary structures remains a challenge. Conventional methods, such as gel electrophoresis and chemical probing, have limitations in terms of sensitivity and specificity. Emerging techniques, including next-generation sequencing and single-molecule approaches, offer promise but face challenges since these techniques are mostly limited to only one type of secondary structure. Here we describe an updated version of a technique permanganate/S1 nuclease footprinting, which uses potassium permanganate to trap single-stranded DNA regions as found in many non-B structures, in combination with S1 nuclease digest and adapter ligation to detect genome-wide non-B formation. To overcome technical hurdles, we combined this method with direct adapter ligation and sequencing (PDAL-Seq). Furthermore, we established a user-friendly pipeline available on Galaxy to standardize PDAL-Seq data analysis. This optimized method allows the analysis of many types of DNA secondary structures that form in a living cell and will advance our knowledge of their roles in health and disease., (Copyright © 2024. Published by Elsevier Inc.)
- Published
- 2024
- Full Text
- View/download PDF
13. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.
- Author
-
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler E, and Phillippy AM
- Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species., Competing Interests: Competing Interests EEE is a scientific advisory board (SAB) member of Variant Bio, Inc. RJO is a scientific advisory board (SAB) member of Colossal Biosciences, Inc. CL is a scientific advisory board (SAB) member of Nabsys, Inc. and Genome Insight, Inc.
- Published
- 2023
- Full Text
- View/download PDF
14. Transcript Isoform Diversity of Ampliconic Genes on the Y Chromosome of Great Apes.
- Author
-
Tomaszkiewicz M, Sahlin K, Medvedev P, and Makova KD
- Subjects
- Animals, Male, Humans, Y Chromosome genetics, Pan troglodytes genetics, Protein Isoforms genetics, Pan paniscus genetics, Hominidae genetics
- Abstract
Y chromosomal ampliconic genes (YAGs) are important for male fertility, as they encode proteins functioning in spermatogenesis. The variation in copy number and expression levels of these multicopy gene families has been studied in great apes; however, the diversity of splicing variants remains unexplored. Here, we deciphered the sequences of polyadenylated transcripts of all nine YAG families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) from testis samples of six great ape species (human, chimpanzee, bonobo, gorilla, Bornean orangutan, and Sumatran orangutan). To achieve this, we enriched YAG transcripts with capture probe hybridization and sequenced them with long (Pacific Biosciences) reads. Our analysis of this data set resulted in several findings. First, we observed evolutionarily conserved alternative splicing patterns for most YAG families except for BPY2 and PRY. Second, our results suggest that BPY2 transcripts and proteins originate from separate genomic regions in bonobo versus human, which is possibly facilitated by acquiring new promoters. Third, our analysis indicates that the PRY gene family, having the highest representation of noncoding transcripts, has been undergoing pseudogenization. Fourth, we have not detected signatures of selection in the five YAG families shared among great apes, even though we identified many species-specific protein-coding transcripts. Fifth, we predicted consensus disorder regions across most gene families and species, which could be used for future investigations of male infertility. Overall, our work illuminates the YAG isoform landscape and provides a genomic resource for future functional studies focusing on infertility phenotypes in humans and critically endangered great apes., (© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.)
- Published
- 2023
- Full Text
- View/download PDF
15. The complete sequence of a human Y chromosome.
- Author
-
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, and Phillippy AM
- Subjects
- Humans, Base Sequence, DNA, Satellite genetics, Genetic Variation genetics, Genetics, Population, Heterochromatin genetics, Multigene Family genetics, Reference Standards, Segmental Duplications, Genomic genetics, Tandem Repeat Sequences genetics, Telomere genetics, Chromosomes, Human, Y genetics, Genomics methods, Genomics standards, Sequence Analysis, DNA standards
- Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications
1-3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes., (© 2023. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.)- Published
- 2023
- Full Text
- View/download PDF
16. Accurate sequencing of DNA motifs able to form alternative (non-B) structures.
- Author
-
Weissensteiner MH, Cremona MA, Guiblet WM, Stoler N, Harris RS, Cechova M, Eckert KA, Chiaromonte F, Huang YF, and Makova KD
- Subjects
- Humans, Nucleotide Motifs, Sequence Analysis, DNA, DNA genetics, Base Composition, High-Throughput Nucleotide Sequencing, DNA, Z-Form, Nanopores
- Abstract
Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA., (© 2023 Weissensteiner et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2023
- Full Text
- View/download PDF
17. Whole-genome sequence and assembly of the Javan gibbon (Hylobates moloch).
- Author
-
Escalona M, VanCampen J, Maurer NW, Haukness M, Okhovat M, Harris RS, Watwood A, Hartley GA, O'Neill RJ, Medvedev P, Makova KD, Vollmers C, Carbone L, and Green RE
- Subjects
- Animals, Forests, Endangered Species, Indonesia, Hylobates genetics, Genome
- Abstract
The Javan gibbon, Hylobates moloch, is an endangered gibbon species restricted to the forest remnants of western and central Java, Indonesia, and one of the rarest of the Hylobatidae family. Hylobatids consist of 4 genera (Holoock, Hylobates, Symphalangus, and Nomascus) that are characterized by different numbers of chromosomes, ranging from 38 to 52. The underlying cause of this karyotype plasticity is not entirely understood, at least in part, due to the limited availability of genomic data. Here we present the first scaffold-level assembly for H. moloch using a combination of whole-genome Illumina short reads, 10X Chromium linked reads, PacBio, and Oxford Nanopore long reads and proximity-ligation data. This Hylobates genome represents a valuable new resource for comparative genomics studies in primates., (© The American Genetic Association. 2022.)
- Published
- 2023
- Full Text
- View/download PDF
18. Noncanonical DNA structures are drivers of genome evolution.
- Author
-
Makova KD and Weissensteiner MH
- Subjects
- Humans, Base Sequence, Genomic Instability genetics, Evolution, Molecular, DNA Transposable Elements genetics, Genomics
- Abstract
In addition to the canonical right-handed double helix, other DNA structures, termed 'non-B DNA', can form in the genomes across the tree of life. Non-B DNA regulates multiple cellular processes, including replication and transcription, yet its presence is associated with elevated mutagenicity and genome instability. These discordant cellular roles fuel the enormous potential of non-B DNA to drive genomic and phenotypic evolution. Here we discuss recent studies establishing non-B DNA structures as novel functional elements subject to natural selection, affecting evolution of transposable elements (TEs), and specifying centromeres. By highlighting the contributions of non-B DNA to repeated evolution and adaptation to changing environments, we conclude that evolutionary analyses should include a perspective of not only DNA sequence, but also its structure., Competing Interests: Declaration of interests No interests are declared., (Copyright © 2022 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2023
- Full Text
- View/download PDF
19. Constructing a polygenic risk score for childhood obesity using functional data analysis.
- Author
-
Craig SJC, Kenney AM, Lin J, Paul IM, Birch LL, Savage JS, Marini ME, Chiaromonte F, Reimherr ML, and Makova KD
- Abstract
Obesity is a highly heritable condition that affects increasing numbers of adults and, concerningly, of children. However, only a small fraction of its heritability has been attributed to specific genetic variants. These variants are traditionally ascertained from genome-wide association studies (GWAS), which utilize samples with tens or hundreds of thousands of individuals for whom a single summary measurement (e.g., BMI) is collected. An alternative approach is to focus on a smaller, more deeply characterized sample in conjunction with advanced statistical models that leverage longitudinal phenotypes. Novel functional data analysis (FDA) techniques are used to capitalize on longitudinal growth information from a cohort of children between birth and three years of age. In an ultra-high dimensional setting, hundreds of thousands of single nucleotide polymorphisms (SNPs) are screened, and selected SNPs are used to construct two polygenic risk scores (PRS) for childhood obesity using a weighting approach that incorporates the dynamic and joint nature of SNP effects. These scores are significantly higher in children with (vs. without) rapid infant weight gain-a predictor of obesity later in life. Using two independent cohorts, it is shown that the genetic variants identified in very young children are also informative in older children and in adults, consistent with early childhood obesity being predictive of obesity later in life. In contrast, PRSs based on SNPs identified by adult obesity GWAS are not predictive of weight gain in the cohort of young children. This provides an example of a successful application of FDA to GWAS. This application is complemented with simulations establishing that a deeply characterized sample can be just as, if not more, effective than a comparable study with a cross-sectional response. Overall, it is demonstrated that a deep, statistically sophisticated characterization of a longitudinal phenotype can provide increased statistical power to studies with relatively small sample sizes; and shows how FDA approaches can be used as an alternative to the traditional GWAS., Competing Interests: Declarations of interest none
- Published
- 2023
- Full Text
- View/download PDF
20. Variation in G-quadruplex sequence and topology differentially impacts human DNA polymerase fidelity.
- Author
-
Stein M, Hile SE, Weissensteiner MH, Lee M, Zhang S, Kejnovský E, Kejnovská I, Makova KD, and Eckert KA
- Subjects
- DNA genetics, DNA Replication, DNA-Directed DNA Polymerase genetics, DNA-Directed DNA Polymerase metabolism, Humans, Vascular Endothelial Growth Factor A genetics, G-Quadruplexes
- Abstract
G-quadruplexes (G4s), a type of non-B DNA, play important roles in a wide range of molecular processes, including replication, transcription, and translation. Genome integrity relies on efficient and accurate DNA synthesis, and is compromised by various stressors, to which non-B DNA structures such as G4s can be particularly vulnerable. However, the impact of G4 structures on DNA polymerase fidelity is largely unknown. Using an in vitro forward mutation assay, we investigated the fidelity of human DNA polymerases delta (δ4, four-subunit), eta (η), and kappa (κ) during synthesis of G4 motifs representing those in the human genome. The motifs differ in sequence, topology, and stability, features that may affect DNA polymerase errors. Polymerase error rate hierarchy (δ4 < κ < η) is largely maintained during G4 synthesis. Importantly, we observed unique polymerase error signatures during synthesis of VEGF G4 motifs, stable G4s which form parallel topologies. These statistically significant errors occurred within, immediately flanking, and encompassing the G4 motif. For pol δ4, the errors were deletions, insertions and complex errors within the G4 or encompassing the G4 motif and surrounding sequence. For pol η, the errors occurred in 3' sequences flanking the G4 motif. For pol κ, the errors were frameshift mutations within G-tracts of the G4. Because these error signatures were not observed during synthesis of an antiparallel G4 and, to a lesser extent, a hybrid G4, we suggest that G4 topology and/or stability could influence polymerase fidelity. Using in silico analyses, we show that most polymerase errors are predicted to have minimal effects on predicted G4 stability. Our results provide a unique view of G4s not previously elucidated, showing that G4 motif heterogeneity differentially influences polymerase fidelity within the motif and flanking sequences. Thus, our study advances the understanding of how DNA polymerase errors contribute to G4 mutagenesis., Competing Interests: Conflict of interest statement The authors declare no conflict of interest with this research., (Copyright © 2022 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
21. Exploring the Effects of Mitonuclear Interactions on Mitochondrial DNA Gene Expression in Humans.
- Author
-
Torres-Gonzalez E and Makova KD
- Abstract
Most mitochondrial protein complexes include both nuclear and mitochondrial gene products, which coevolved to work together. This coevolution can be disrupted due to disparity in genetic ancestry between the nuclear and mitochondrial genomes in recently admixed populations. Such mitonuclear DNA discordance might result in phenotypic effects. Several nuclear-encoded proteins regulate expression of mitochondrial DNA (mtDNA) genes. We hypothesized that mitonuclear DNA discordance affects expression of genes encoded by mtDNA. To test this, we utilized the data from the GTEx project, which contains expression levels for ∼100 African Americans and >600 European Americans. The varying proportion of African and European ancestry in recently admixed African Americans provides a range of mitonuclear discordance values, which can be correlated with mtDNA gene expression levels (adjusted for age and ischemic time). In contrast, European Americans did not undergo recent admixture. We demonstrated that, for most mtDNA protein-coding genes, expression levels in energetically-demanding tissues were lower in African Americans than in European Americans. Furthermore, gene expression levels were lower in individuals with higher mitonuclear discordance, independent of population. Moreover, we found a negative correlation between mtDNA gene expression and mitonuclear discordance. In African Americans, the average value of African ancestry was higher for nuclear-encoded mitochondrial than non-mitochondrial genes, facilitating a match in ancestry with the mtDNA and more optimal interactions. These results represent an example of a phenotypic effect of mitonuclear discordance on human admixed populations, and have potential biomedical applications., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2022 Torres-Gonzalez and Makova.)
- Published
- 2022
- Full Text
- View/download PDF
22. Advanced age increases frequencies of de novo mitochondrial mutations in macaque oocytes and somatic tissues.
- Author
-
Arbeithuber B, Cremona MA, Hester J, Barrett A, Higgins B, Anthony K, Chiaromonte F, Diaz FJ, and Makova KD
- Subjects
- Animals, DNA, Mitochondrial genetics, DNA, Mitochondrial metabolism, Humans, Macaca mulatta genetics, Aging, Mitochondria genetics, Mutation, Oocytes metabolism
- Abstract
Mutations in mitochondrial DNA (mtDNA) contribute to multiple diseases. However, how new mtDNA mutations arise and accumulate with age remains understudied because of the high error rates of current sequencing technologies. Duplex sequencing reduces error rates by several orders of magnitude via independently tagging and analyzing each of the two template DNA strands. Here, using duplex sequencing, we obtained high-quality mtDNA sequences for somatic tissues (liver and skeletal muscle) and single oocytes of 30 unrelated rhesus macaques, from 1 to 23 y of age. Sequencing single oocytes minimized effects of natural selection on germline mutations. In total, we identified 17,637 tissue-specific de novo mutations. Their frequency increased ∼3.5-fold in liver and ∼2.8-fold in muscle over the ∼20 y assessed. Mutation frequency in oocytes increased ∼2.5-fold until the age of 9 y, but did not increase after that, suggesting that oocytes of older animals maintain the quality of their mtDNA. We found the light-strand origin of replication (OriL) to be a hotspot for mutation accumulation with aging in liver. Indeed, the 33-nucleotide-long OriL harbored 12 variant hotspots, 10 of which likely disrupt its hairpin structure and affect replication efficiency. Moreover, in somatic tissues, protein-coding variants were subject to positive selection (potentially mitigating toxic effects of mitochondrial activity), the strength of which increased with the number of macaques harboring variants. Our work illuminates the origins and accumulation of somatic and germline mtDNA mutations with aging in primates and has implications for delayed reproduction in modern human societies.
- Published
- 2022
- Full Text
- View/download PDF
23. INSIGHT responsive parenting educational intervention for firstborns is associated with growth of second-born siblings.
- Author
-
Savage JS, Hochgraf AK, Loken E, Marini ME, Craig SJC, Makova KD, Birch LL, and Paul IM
- Subjects
- Child, Female, Humans, Infant, Mothers psychology, Obesity, Parturition, Pregnancy, Parenting, Siblings psychology
- Abstract
Objective: The aim of this study was to test whether the Intervention Nurses Start Infants Growing on Healthy Trajectories (INSIGHT) responsive parenting (RP) intervention, delivered to parents of firstborn children, is associated with the BMI of first- and second-born siblings during infancy., Methods: Participants included 117 firstborn infants enrolled in a randomized controlled trial and their second-born siblings enrolled in an observation-only ancillary study. The RP curriculum for firstborn children included guidance on feeding, sleep, interactive play, and emotion regulation. The control curriculum focused on safety. Anthropometrics were measured in both siblings at ages 3, 16, 28, and 52 weeks. Growth curve models for BMI by child age were fit., Results: Second-born children were delivered 2.5 (SD 0.9) years after firstborns. Firstborn and second-born children whose parents received the RP intervention with their first child had BMI that was 0.44 kg/m2 (95% CI: -0.82 to 0.06) and 0.36 kg/m2 (95% CI: -0.75 to 0.03) lower than controls, respectively. Linear and quadratic growth rates for BMI for firstborn and second-born cohorts were similar, but second-born children had a greater average BMI at 1 year of age (difference = -0.33 [95% CI: -0.52 to -0.15])., Conclusions: A RP educational intervention for obesity prevention delivered to parents of firstborns appears to spill over to second-born siblings., (© 2021 The Obesity Society.)
- Published
- 2022
- Full Text
- View/download PDF
24. Metabolomic profiling of stool of two-year old children from the INSIGHT study reveals links between butyrate and child weight outcomes.
- Author
-
Nandy D, Craig SJC, Cai J, Tian Y, Paul IM, Savage JS, Marini ME, Hohman EE, Reimherr ML, Patterson AD, Makova KD, and Chiaromonte F
- Subjects
- Body Mass Index, Butyrates, Child, Child, Preschool, Feces, Female, Humans, Mothers, Pregnancy, Gastrointestinal Microbiome, Pediatric Obesity epidemiology
- Abstract
Background: Metabolomic analysis is commonly used to understand the biological underpinning of diseases such as obesity. However, our knowledge of gut metabolites related to weight outcomes in young children is currently limited., Objectives: To (1) explore the relationships between metabolites and child weight outcomes, (2) determine the potential effect of covariates (e.g., child's diet, maternal health/habits during pregnancy, etc.) in the relationship between metabolites and child weight outcomes, and (3) explore the relationship between selected gut metabolites and gut microbiota abundance., Methods: Using
1 H-NMR, we quantified 30 metabolites from stool samples of 170 two-year-old children. To identify metabolites and covariates associated with children's weight outcomes (BMI [weight/height2 ], BMI z-score [BMI adjusted for age and sex], and growth index [weight/height]), we analysed the1 H-NMR data, along with 20 covariates recorded on children and mothers, using LASSO and best subset selection regression techniques. Previously characterized microbiota community information from the same stool samples was used to determine associations between selected gut metabolites and gut microbiota., Results: At age 2 years, stool butyrate concentration had a significant positive association with child BMI (p-value = 3.58 × 10-4 ), BMI z-score (p-value = 3.47 × 10-4 ), and growth index (p-value = 7.73 × 10-4 ). Covariates such as maternal smoking during pregnancy are important to consider. Butyrate concentration was positively associated with the abundance of the bacterial genus Faecalibacterium (p-value = 9.61 × 10-3 )., Conclusions: Stool butyrate concentration is positively associated with increased child weight outcomes and should be investigated further as a factor affecting childhood obesity., (© 2021 The Authors. Pediatric Obesity published by John Wiley & Sons Ltd on behalf of World Obesity Federation.)- Published
- 2022
- Full Text
- View/download PDF
25. Associations between stool micro-transcriptome, gut microbiota, and infant growth.
- Author
-
Carney MC, Zhan X, Rangnekar A, Chroneos MZ, Craig SJC, Makova KD, Paul IM, and Hicks SD
- Subjects
- Female, Follow-Up Studies, Gastrointestinal Microbiome physiology, Gene Expression Profiling methods, Gene Expression Profiling statistics & numerical data, Growth and Development genetics, Humans, Infant, Male, Pennsylvania, Feces microbiology, Gastrointestinal Microbiome genetics, Growth and Development physiology
- Abstract
Rapid infant growth increases the risk for adult obesity. The gut microbiome is associated with early weight status; however, no study has examined how interactions between microbial and host ribonucleic acid (RNA) expression influence infant growth. We hypothesized that dynamics in infant stool micro-ribonucleic acids (miRNAs) would be associated with both microbial activity and infant growth via putative metabolic targets. Stool was collected twice from 30 full-term infants, at 1 month and again between 6 and 12 months. Stool RNA were measured with high-throughput sequencing and aligned to human and microbial databases. Infant growth was measured by weight-for-length z-score at birth and 12 months. Increased RNA transcriptional activity of Clostridia (R = 0.55; Adj p = 3.7E-2) and Burkholderia (R = -0.820, Adj p = 2.62E-3) were associated with infant growth. Of the 25 human RNAs associated with growth, 16 were miRNAs. The miRNAs demonstrated significant target enrichment (Adj p < 0.05) for four metabolic pathways. There were four associations between growth-related miRNAs and growth-related phyla. We have shown that longitudinal trends in gut microbiota activity and human miRNA levels are associated with infant growth and the metabolic targets of miRNAs suggest these molecules may regulate the biosynthetic landscape of the gut and influence microbial activity.
- Published
- 2021
- Full Text
- View/download PDF
26. Selection and thermostability suggest G-quadruplexes are novel functional elements of the human genome.
- Author
-
Guiblet WM, DeGiorgio M, Cheng X, Chiaromonte F, Eckert KA, Huang YF, and Makova KD
- Abstract
Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements., (© 2021 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2021
- Full Text
- View/download PDF
27. Towards complete and error-free genome assemblies of all vertebrate species.
- Author
-
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW 3rd, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, and Jarvis ED
- Subjects
- Animals, Birds, Gene Library, Genome Size, Genome, Mitochondrial, Haplotypes, High-Throughput Nucleotide Sequencing, Molecular Sequence Annotation, Sequence Alignment, Sequence Analysis, DNA, Sex Chromosomes genetics, Genome, Genomics methods, Vertebrates genetics
- Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species
1-4 . To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.- Published
- 2021
- Full Text
- View/download PDF
28. Non-B DNA: a major contributor to small- and large-scale variation in nucleotide substitution frequencies across the genome.
- Author
-
Guiblet WM, Cremona MA, Harris RS, Chen D, Eckert KA, Chiaromonte F, Huang YF, and Makova KD
- Subjects
- Animals, Genetic Loci, Humans, Mutation Rate, Polymorphism, Single Nucleotide, Pongo pygmaeus, DNA chemistry, Genetic Variation, Genome, Human
- Abstract
Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases., (© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Published
- 2021
- Full Text
- View/download PDF
29. Human L1 Transposition Dynamics Unraveled with Functional Data Analysis.
- Author
-
Chen D, Cremona MA, Qi Z, Mitra RD, Chiaromonte F, and Makova KD
- Subjects
- Humans, Mutagenesis, Insertional, Biological Evolution, DNA Transposable Elements, Genome, Human, Long Interspersed Nucleotide Elements, Models, Genetic
- Abstract
Long INterspersed Elements-1 (L1s) constitute >17% of the human genome and still actively transpose in it. Characterizing L1 transposition across the genome is critical for understanding genome evolution and somatic mutations. However, to date, L1 insertion and fixation patterns have not been studied comprehensively. To fill this gap, we investigated three genome-wide data sets of L1s that integrated at different evolutionary times: 17,037 de novo L1s (from an L1 insertion cell-line experiment conducted in-house), and 1,212 polymorphic and 1,205 human-specific L1s (from public databases). We characterized 49 genomic features-proxying chromatin accessibility, transcriptional activity, replication, recombination, etc.-in the ±50 kb flanks of these elements. These features were contrasted between the three L1 data sets and L1-free regions using state-of-the-art Functional Data Analysis statistical methods, which treat high-resolution data as mathematical functions. Our results indicate that de novo, polymorphic, and human-specific L1s are surrounded by different genomic features acting at specific locations and scales. This led to an integrative model of L1 transposition, according to which L1s preferentially integrate into open-chromatin regions enriched in non-B DNA motifs, whereas they are fixed in regions largely free of purifying selection-depleted of genes and noncoding most conserved elements. Intriguingly, our results suggest that L1 insertions modify local genomic landscape by extending CpG methylation and increasing mononucleotide microsatellite density. Altogether, our findings substantially facilitate understanding of L1 integration and fixation preferences, pave the way for uncovering their role in aging and cancer, and inform their use as mutagenesis tools in genetic studies., (© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2020
- Full Text
- View/download PDF
30. Dynamic evolution of great ape Y chromosomes.
- Author
-
Cechova M, Vegesna R, Tomaszkiewicz M, Harris RS, Chen D, Rangavittal S, Medvedev P, and Makova KD
- Subjects
- Animals, Biological Evolution, Evolution, Molecular, Gene Conversion, Gorilla gorilla genetics, Humans, Pan paniscus genetics, Pan troglodytes genetics, Pongo genetics, Sequence Analysis, DNA, Hominidae genetics, Y Chromosome genetics
- Abstract
The mammalian male-specific Y chromosome plays a critical role in sex determination and male fertility. However, because of its repetitive and haploid nature, it is frequently absent from genome assemblies and remains enigmatic. The Y chromosomes of great apes represent a particular puzzle: their gene content is more similar between human and gorilla than between human and chimpanzee, even though human and chimpanzee share a more recent common ancestor. To solve this puzzle, here we constructed a dataset including Ys from all extant great ape genera. We generated assemblies of bonobo and orangutan Ys from short and long sequencing reads and aligned them with the publicly available human, chimpanzee, and gorilla Y assemblies. Analyzing this dataset, we found that the genus Pan , which includes chimpanzee and bonobo, experienced accelerated substitution rates. Pan also exhibited elevated gene death rates. These observations are consistent with high levels of sperm competition in Pan Furthermore, we inferred that the great ape common ancestor already possessed multicopy sequences homologous to most human and chimpanzee palindromes. Nonetheless, each species also acquired distinct ampliconic sequences. We also detected increased chromatin contacts between and within palindromes (from Hi-C data), likely facilitating gene conversion and structural rearrangements. Our results highlight the dynamic mode of Y chromosome evolution and open avenues for studies of male-specific dispersal in endangered great ape species., Competing Interests: The authors declare no competing interest., (Copyright © 2020 the Author(s). Published by PNAS.)
- Published
- 2020
- Full Text
- View/download PDF
31. Age-related accumulation of de novo mitochondrial mutations in mammalian oocytes and somatic tissues.
- Author
-
Arbeithuber B, Hester J, Cremona MA, Stoler N, Zaidi A, Higgins B, Anthony K, Chiaromonte F, Diaz FJ, and Makova KD
- Subjects
- Animals, DNA Mutational Analysis, DNA, Mitochondrial genetics, Female, Gene Frequency genetics, Genetic Drift, Germ Cells metabolism, Inheritance Patterns genetics, Logistic Models, Male, Mice, Models, Genetic, Mutation Rate, Nucleotides genetics, Pedigree, Aging genetics, Mammals genetics, Mitochondria genetics, Mutation genetics, Oocytes metabolism, Organ Specificity genetics
- Abstract
Mutations create genetic variation for other evolutionary forces to operate on and cause numerous genetic diseases. Nevertheless, how de novo mutations arise remains poorly understood. Progress in the area is hindered by the fact that error rates of conventional sequencing technologies (1 in 100 or 1,000 base pairs) are several orders of magnitude higher than de novo mutation rates (1 in 10,000,000 or 100,000,000 base pairs per generation). Moreover, previous analyses of germline de novo mutations examined pedigrees (and not germ cells) and thus were likely affected by selection. Here, we applied highly accurate duplex sequencing to detect low-frequency, de novo mutations in mitochondrial DNA (mtDNA) directly from oocytes and from somatic tissues (brain and muscle) of 36 mice from two independent pedigrees. We found mtDNA mutation frequencies 2- to 3-fold higher in 10-month-old than in 1-month-old mice, demonstrating mutation accumulation during the period of only 9 mo. Mutation frequencies and patterns differed between germline and somatic tissues and among mtDNA regions, suggestive of distinct mutagenesis mechanisms. Additionally, we discovered a more pronounced genetic drift of mitochondrial genetic variants in the germline of older versus younger mice, arguing for mtDNA turnover during oocyte meiotic arrest. Our study deciphered for the first time the intricacies of germline de novo mutagenesis using duplex sequencing directly in oocytes, which provided unprecedented resolution and minimized selection effects present in pedigree studies. Moreover, our work provides important information about the origins and accumulation of mutations with aging/maturation and has implications for delayed reproduction in modern human societies. Furthermore, the duplex sequencing method we optimized for single cells opens avenues for investigating low-frequency mutations in other studies., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2020
- Full Text
- View/download PDF
32. Ampliconic Genes on the Great Ape Y Chromosomes: Rapid Evolution of Copy Number but Conservation of Expression Levels.
- Author
-
Vegesna R, Tomaszkiewicz M, Ryder OA, Campos-Sánchez R, Medvedev P, DeGiorgio M, and Makova KD
- Subjects
- Animals, Hominidae metabolism, Male, Multigene Family, Biological Evolution, Gene Dosage, Gene Expression, Hominidae genetics, Y Chromosome
- Abstract
Multicopy ampliconic gene families on the Y chromosome play an important role in spermatogenesis. Thus, studying their genetic variation in endangered great ape species is critical. We estimated the sizes (copy number) of nine Y ampliconic gene families in population samples of chimpanzee, bonobo, and orangutan with droplet digital polymerase chain reaction, combined these estimates with published data for human and gorilla, and produced genome-wide testis gene expression data for great apes. Analyzing this comprehensive data set within an evolutionary framework, we, first, found high inter- and intraspecific variation in gene family size, with larger families exhibiting higher variation as compared with smaller families, a pattern consistent with random genetic drift. Second, for four gene families, we observed significant interspecific size differences, sometimes even between sister species-chimpanzee and bonobo. Third, despite substantial variation in copy number, Y ampliconic gene families' expression levels did not differ significantly among species, suggesting dosage regulation. Fourth, for three gene families, size was positively correlated with gene expression levels across species, suggesting that, given sufficient evolutionary time, copy number influences gene expression. Our results indicate high variability in size but conservation in gene expression levels in Y ampliconic gene families, significantly advancing our understanding of Y-chromosome evolution in great apes., (© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.)
- Published
- 2020
- Full Text
- View/download PDF
33. Family reunion via error correction: an efficient analysis of duplex sequencing data.
- Author
-
Stoler N, Arbeithuber B, Povysil G, Heinzl M, Salazar R, Makova KD, Tiemann-Boege I, and Nekrutenko A
- Subjects
- Algorithms, DNA chemistry, DNA metabolism, Humans, Sequence Alignment, Sequence Analysis, DNA, User-Computer Interface
- Abstract
Background: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away., Results: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective., Conclusions: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.
- Published
- 2020
- Full Text
- View/download PDF
34. Pronounced somatic bottleneck in mitochondrial DNA of human hair.
- Author
-
Barrett A, Arbeithuber B, Zaidi A, Wilton P, Paul IM, Nielsen R, and Makova KD
- Subjects
- Adolescent, Adult, Aged, High-Throughput Nucleotide Sequencing, Humans, Middle Aged, Pennsylvania, Young Adult, DNA, Mitochondrial genetics, Gene Frequency, Hair chemistry
- Abstract
Heteroplasmy is the presence of variable mitochondrial DNA (mtDNA) within the same individual. The dynamics of heteroplasmy allele frequency among tissues of the human body is not well understood. Here, we measured allele frequency at heteroplasmic sites in two to eight hairs from each of 11 humans using next-generation sequencing. We observed a high variance in heteroplasmic allele frequency among separate hairs from the same individual-much higher than that for blood and cheek tissues. Our population genetic modelling estimated the somatic bottleneck during embryonic follicle development of separate hairs to be only 11.06 (95% confidence interval 0.6-34.0) mtDNA segregating units. This bottleneck is much more drastic than somatic bottlenecks for blood and cheek tissues (136 and 458 units, respectively), as well as more drastic than, or comparable to, the germline bottleneck (equal to 25-32 or 7-10 units, depending on the study). We demonstrated that hair undergoes additional genetic drift before and after the divergence of mtDNA lineages of individual hair follicles. Additionally, we showed a positive correlation between donor's age and variance in heteroplasmy allele frequency in hair. These findings have important implications for forensics and for our understanding of mtDNA dynamics in the human body. This article is part of the theme issue 'Linking the mitochondrial genotype to phenotype: a complex endeavour'.
- Published
- 2020
- Full Text
- View/download PDF
35. Bottleneck and selection in the germline and maternal age influence transmission of mitochondrial DNA in human pedigrees.
- Author
-
Zaidi AA, Wilton PR, Su MS, Paul IM, Arbeithuber B, Anthony K, Nekrutenko A, Nielsen R, and Makova KD
- Subjects
- Adolescent, Adult, Aged, Aged, 80 and over, Child, Child, Preschool, Female, Genetics, Population, Human Genetics, Humans, Male, Middle Aged, Mitochondria genetics, Pedigree, Young Adult, DNA, Mitochondrial genetics, Germ Cells cytology, Maternal Age, Mitochondrial Diseases genetics
- Abstract
Heteroplasmy-the presence of multiple mitochondrial DNA (mtDNA) haplotypes in an individual-can lead to numerous mitochondrial diseases. The presentation of such diseases depends on the frequency of the heteroplasmic variant in tissues, which, in turn, depends on the dynamics of mtDNA transmissions during germline and somatic development. Thus, understanding and predicting these dynamics between generations and within individuals is medically relevant. Here, we study patterns of heteroplasmy in 2 tissues from each of 345 humans in 96 multigenerational families, each with, at least, 2 siblings (a total of 249 mother-child transmissions). This experimental design has allowed us to estimate the timing of mtDNA mutations, drift, and selection with unprecedented precision. Our results are remarkably concordant between 2 complementary population-genetic approaches. We find evidence for a severe germline bottleneck (7-10 mtDNA segregating units) that occurs independently in different oocyte lineages from the same mother, while somatic bottlenecks are less severe. We demonstrate that divergence between mother and offspring increases with the mother's age at childbirth, likely due to continued drift of heteroplasmy frequencies in oocytes under meiotic arrest. We show that this period is also accompanied by mutation accumulation leading to more de novo mutations in children born to older mothers. We show that heteroplasmic variants at intermediate frequencies can segregate for many generations in the human population, despite the strong germline bottleneck. We show that selection acts during germline development to keep the frequency of putatively deleterious variants from rising. Our findings have important applications for clinical genetics and genetic counseling., Competing Interests: The authors declare no competing interest., (Copyright © 2019 the Author(s). Published by PNAS.)
- Published
- 2019
- Full Text
- View/download PDF
36. Noise-cancelling repeat finder: uncovering tandem repeats in error-prone long-read sequencing data.
- Author
-
Harris RS, Cechova M, and Makova KD
- Subjects
- Genome, Human, Humans, Nanopores, Sequence Analysis, DNA, Software, Tandem Repeat Sequences, High-Throughput Nucleotide Sequencing
- Abstract
Summary: Tandem DNA repeats can be sequenced with long-read technologies, but cannot be accurately deciphered due to the lack of computational tools taking high error rates of these technologies into account. Here we introduce Noise-Cancelling Repeat Finder (NCRF) to uncover putative tandem repeats of specified motifs in noisy long reads produced by Pacific Biosciences and Oxford Nanopore sequencers. Using simulations, we validated the use of NCRF to locate tandem repeats with motifs of various lengths and demonstrated its superior performance as compared to two alternative tools. Using real human whole-genome sequencing data, NCRF identified long arrays of the (AATGG)n repeat involved in heat shock stress response., Availability and Implementation: NCRF is implemented in C, supported by several python scripts, and is available in bioconda and at https://github.com/makovalab-psu/NoiseCancellingRepeatFinder., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2019. Published by Oxford University Press.)
- Published
- 2019
- Full Text
- View/download PDF
37. High Satellite Repeat Turnover in Great Apes Studied with Short- and Long-Read Technologies.
- Author
-
Cechova M, Harris RS, Tomaszkiewicz M, Arbeithuber B, Chiaromonte F, and Makova KD
- Abstract
Satellite repeats are a structural component of centromeres and telomeres, and in some instances, their divergence is known to drive speciation. Due to their highly repetitive nature, satellite sequences have been understudied and underrepresented in genome assemblies. To investigate their turnover in great apes, we studied satellite repeats of unit sizes up to 50 bp in human, chimpanzee, bonobo, gorilla, and Sumatran and Bornean orangutans, using unassembled short and long sequencing reads. The density of satellite repeats, as identified from accurate short reads (Illumina), varied greatly among great ape genomes. These were dominated by a handful of abundant repeated motifs, frequently shared among species, which formed two groups: 1) the (AATGG)n repeat (critical for heat shock response) and its derivatives; and 2) subtelomeric 32-mers involved in telomeric metabolism. Using the densities of abundant repeats, individuals could be classified into species. However, clustering did not reproduce the accepted species phylogeny, suggesting rapid repeat evolution. Several abundant repeats were enriched in males versus females; using Y chromosome assemblies or Fluorescent In Situ Hybridization, we validated their location on the Y. Finally, applying a novel computational tool, we identified many satellite repeats completely embedded within long Oxford Nanopore and Pacific Biosciences reads. Such repeats were up to 59 kb in length and consisted of perfect repeats interspersed with other similar sequences. Our results based on sequencing reads generated with three different technologies provide the first detailed characterization of great ape satellite repeats, and open new avenues for exploring their functions., (© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.)
- Published
- 2019
- Full Text
- View/download PDF
38. Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes.
- Author
-
Vegesna R, Tomaszkiewicz M, Medvedev P, and Makova KD
- Subjects
- Animals, Chromosomes, Human, Y physiology, DNA Copy Number Variations genetics, Databases, Genetic, Dosage Compensation, Genetic genetics, Dosage Compensation, Genetic physiology, Epigenesis, Genetic genetics, Gene Dosage genetics, Gene Expression genetics, Gene Expression Regulation genetics, Genes, Y-Linked physiology, Heat Shock Transcription Factors genetics, Heat Shock Transcription Factors metabolism, Humans, Male, Multigene Family genetics, Testis metabolism, Chromosomes, Human, Y genetics, Genes, Y-Linked genetics, Sequence Analysis, DNA methods
- Abstract
The Y chromosome harbors nine multi-copy ampliconic gene families expressed exclusively in testis. The gene copies within each family are >99% identical to each other, which poses a major challenge in evaluating their copy number. Recent studies demonstrated high variation in Y ampliconic gene copy number among humans. However, how this variation affects expression levels in human testis remains understudied. Here we developed a novel computational tool Ampliconic Copy Number Estimator (AmpliCoNE) that utilizes read sequencing depth information to estimate Y ampliconic gene copy number per family. We applied this tool to whole-genome sequencing data of 149 men with matched testis expression data whose samples are part of the Genotype-Tissue Expression (GTEx) project. We found that the Y ampliconic gene families with low copy number in humans were deleted or pseudogenized in non-human great apes, suggesting relaxation of functional constraints. Among the Y ampliconic gene families, higher copy number leads to higher expression. Within the Y ampliconic gene families, copy number does not influence gene expression, rather a high tolerance for variation in gene expression was observed in testis of presumably healthy men. No differences in gene expression levels were found among major Y haplogroups. Age positively correlated with expression levels of the HSFY and PRY gene families in the African subhaplogroup E1b, but not in the European subhaplogroups R1b and I1. We also found that expression of five Y ampliconic gene families is coordinated with that of their non-Y (i.e. X or autosomal) homologs. Indeed, five ampliconic gene families had consistently lower expression levels when compared to their non-Y homologs suggesting dosage regulation, while the HSFY family had higher expression levels than its X homolog and thus lacked dosage regulation., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2019
- Full Text
- View/download PDF
39. Functional data analysis for computational biology.
- Author
-
Cremona MA, Xu H, Makova KD, Reimherr M, Chiaromonte F, and Madrigal P
- Abstract
Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2019
- Full Text
- View/download PDF
40. DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies.
- Author
-
Rangavittal S, Stopa N, Tomaszkiewicz M, Sahlin K, Makova KD, and Medvedev P
- Subjects
- Female, Haploidy, Humans, Male, Sequence Analysis, DNA economics, Time Factors, Chromosomes, Human, Y genetics, Sequence Analysis, DNA methods
- Abstract
Background: Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references., Results: We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female., Conclusion: DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect.
- Published
- 2019
- Full Text
- View/download PDF
41. Investigating mitonuclear interactions in human admixed populations.
- Author
-
Zaidi AA and Makova KD
- Subjects
- Caribbean Region, Colombia, DNA Copy Number Variations, Humans, Mexico, Peru, Puerto Rico, United States, Cell Nucleus genetics, DNA, Mitochondrial genetics, Genetic Variation
- Abstract
To function properly, mitochondria utilize products of 37 mitochondrial and >1,000 nuclear genes, which should be compatible with each other. Discordance between mitochondrial and nuclear genetic ancestry could contribute to phenotypic variation in admixed populations. Here, we explored potential mitonuclear incompatibility in six admixed human populations from the Americas: African Americans, African Caribbeans, Colombians, Mexicans, Peruvians and Puerto Ricans. By comparing nuclear versus mitochondrial ancestry in these populations, we first show that mitochondrial DNA (mtDNA) copy number decreases with increasing discordance between nuclear and mtDNA ancestry. The direction of this effect is consistent across mtDNA haplogroups of different geographic origins. This observation indicates suboptimal regulation of mtDNA replication when its components are encoded by nuclear and mtDNA genes with different ancestry. Second, while most populations analysed exhibit no such trend, in African Americans and Puerto Ricans, we find a significant enrichment of ancestry at nuclear-encoded mitochondrial genes towards the source populations contributing the most prevalent mtDNA haplogroups (African and Native American, respectively). This possibly reflects compensatory effects of selection in recovering mitonuclear interactions optimized in the source populations. Our results provide evidence of mitonuclear interactions in human admixed populations and we discuss their implications for human health and disease.
- Published
- 2019
- Full Text
- View/download PDF
42. Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.
- Author
-
Guiblet WM, Cremona MA, Cechova M, Harris RS, Kejnovská I, Kejnovsky E, Eckert K, Chiaromonte F, and Makova KD
- Subjects
- DNA Replication, G-Quadruplexes, Humans, Kinetics, Mutation, Nucleotide Motifs, Reproducibility of Results, DNA chemistry, Genomics methods, Genomics standards, High-Throughput Nucleotide Sequencing methods, High-Throughput Nucleotide Sequencing standards, Nucleic Acid Conformation, Sequence Analysis, DNA methods
- Abstract
DNA conformation may deviate from the classical B-form in ∼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations., (© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2018
- Full Text
- View/download PDF
43. Correcting palindromes in long reads after whole-genome amplification.
- Author
-
Warris S, Schijlen E, van de Geest H, Vegesna R, Hesselink T, Te Lintel Hekkert B, Sanchez Perez G, Medvedev P, Makova KD, and de Ridder D
- Subjects
- Algorithms, Animals, DNA genetics, Research Design, Arabidopsis genetics, DNA analysis, Gorilla gorilla genetics, Nucleotides genetics, Sequence Analysis, DNA methods, Whole Genome Sequencing methods, Y Chromosome genetics
- Abstract
Background: Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads., Results: Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA., Conclusions: With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.
- Published
- 2018
- Full Text
- View/download PDF
44. Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon.
- Author
-
Sahlin K, Tomaszkiewicz M, Makova KD, and Medvedev P
- Subjects
- Aged, Computer Simulation, Exons genetics, Fragile X Mental Retardation Protein genetics, Gene Dosage, Humans, Male, Middle Aged, Protein Isoforms genetics, Protein Isoforms metabolism, RNA Splicing genetics, RNA, Messenger metabolism, Reproducibility of Results, Testis metabolism, Algorithms, Multigene Family, RNA, Messenger genetics, Sequence Analysis, RNA methods
- Abstract
A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.
- Published
- 2018
- Full Text
- View/download PDF
45. Child Weight Gain Trajectories Linked To Oral Microbiota Composition.
- Author
-
Craig SJC, Blankenberg D, Parodi ACL, Paul IM, Birch LL, Savage JS, Marini ME, Stokes JL, Nekrutenko A, Reimherr M, Chiaromonte F, and Makova KD
- Subjects
- Bacteria genetics, Bacteria isolation & purification, Body-Weight Trajectory, Child, Preschool, DNA, Ribosomal genetics, Female, Growth Charts, Humans, Male, Microbiota, Phylogeny, Bacteria classification, Gastrointestinal Tract microbiology, Mouth microbiology, RNA, Ribosomal, 16S genetics, Sequence Analysis, DNA methods, Weight Gain
- Abstract
Gut and oral microbiota perturbations have been observed in obese adults and adolescents; less is known about their influence on weight gain in young children. Here we analyzed the gut and oral microbiota of 226 two-year-olds with 16S rRNA gene sequencing. Weight and length were measured at seven time points and used to identify children with rapid infant weight gain (a strong risk factor for childhood obesity), and to derive growth curves with innovative Functional Data Analysis (FDA) techniques. We showed that growth curves were associated negatively with diversity, and positively with the Firmicutes-to-Bacteroidetes ratio, of the oral microbiota. We also demonstrated an association between the gut microbiota and child growth, even after controlling for the effect of diet on the microbiota. Lastly, we identified several bacterial genera that were associated with child growth patterns. These results suggest that by the age of two, the oral microbiota of children with rapid infant weight gain may have already begun to establish patterns often seen in obese adults. They also suggest that the gut microbiota at age two, while strongly influenced by diet, does not harbor obesity signatures many researchers identified in later life stages.
- Published
- 2018
- Full Text
- View/download PDF
46. IWTomics: testing high-resolution sequence-based 'Omics' data at multiple locations and scales.
- Author
-
Cremona MA, Pini A, Cumbo F, Makova KD, Chiaromonte F, and Vantini S
- Subjects
- Genome, Sequence Analysis, Workflow, Databases, Factual, Genomics methods, Software
- Abstract
Summary: With increased generation of high-resolution sequence-based 'Omics' data, detecting statistically significant effects at different genomic locations and scales has become key to addressing several scientific questions. IWTomics is an R/Bioconductor package (integrated in Galaxy) that, exploiting sophisticated Functional Data Analysis techniques (i.e. statistical techniques that deal with the analysis of curves), allows users to pre-process, visualize and test these data at multiple locations and scales. The package provides a friendly, flexible and complete workflow that can be employed in many genomic and epigenomic applications., Availability and Implementation: IWTomics is freely available at the Bioconductor website (http://bioconductor.org/packages/IWTomics) and on the main Galaxy instance (https://usegalaxy.org/)., Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2018
- Full Text
- View/download PDF
47. High Levels of Copy Number Variation of Ampliconic Genes across Major Human Y Haplogroups.
- Author
-
Ye D, Zaidi AA, Tomaszkiewicz M, Anthony K, Liebowitz C, DeGiorgio M, Shriver MD, and Makova KD
- Subjects
- Evolution, Molecular, Genome, Human, Haplotypes, Humans, Male, Multigene Family, Phenotype, Body Height, Chromosomes, Human, Y, DNA Copy Number Variations, Gene Amplification, Masculinity
- Abstract
Because of its highly repetitive nature, the human male-specific Y chromosome remains understudied. It is important to investigate variation on the Y chromosome to understand its evolution and contribution to phenotypic variation, including infertility. Approximately 20% of the human Y chromosome consists of ampliconic regions which include nine multi-copy gene families. These gene families are expressed exclusively in testes and usually implicated in spermatogenesis. Here, to gain a better understanding of the role of the Y chromosome in human evolution and in determining sexually dimorphic traits, we studied ampliconic gene copy number variation in 100 males representing ten major Y haplogroups world-wide. Copy number was estimated with droplet digital PCR. In contrast to low nucleotide diversity observed on the Y in previous studies, here we show that ampliconic gene copy number diversity is very high. A total of 98 copy-number-based haplotypes were observed among 100 individuals, and haplotypes were sometimes shared by males from very different haplogroups, suggesting homoplasies. The resulting haplotypes did not cluster according to major Y haplogroups. Overall, only two gene families (RBMY and TSPY) showed significant differences in copy number among major Y haplogroups, and the haplogroup of a male could not be predicted based on his ampliconic gene copy numbers. Finally, we did not find significant correlations either between copy number variation and individual's height, or between the former and facial masculinity/femininity. Our results suggest rapid evolution of ampliconic gene copy numbers on the human Y, and we discuss its causes.
- Published
- 2018
- Full Text
- View/download PDF
48. RecoverY: k-mer-based read classification for Y-chromosome-specific sequencing and assembly.
- Author
-
Rangavittal S, Harris RS, Cechova M, Tomaszkiewicz M, Chikhi R, Makova KD, and Medvedev P
- Subjects
- Algorithms, Animals, Chromosomes, Mammalian, Genomics methods, Gorilla gorilla genetics, Humans, Male, Mammals, High-Throughput Nucleotide Sequencing methods, Sequence Analysis, DNA methods, Software, Y Chromosome
- Abstract
Motivation: The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. As the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y. However, the strategy required the manual setting of key parameters, a time-consuming process leading to sub-optimal assemblies., Results: We develop a method, RecoverY, that selects Y-specific reads by automatically choosing the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used by Tomaszkiewicz et al., we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection., Availability and Implementation: Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverY., Contact: kmakova@bx.psu.edu or pashadag@cse.psu.edu., Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2018
- Full Text
- View/download PDF
49. Deep-Coverage MPS Analysis of Heteroplasmic Variants within the mtGenome Allows for Frequent Differentiation of Maternal Relatives.
- Author
-
Holland MM, Makova KD, and McElhoe JA
- Abstract
Abstract : Distinguishing between maternal relatives through mitochondrial (mt) DNA sequence analysis has been a longstanding desire of the forensic community. Using a deep-coverage, massively parallel sequencing (DCMPS) approach, we studied the pattern of mtDNA heteroplasmy across the mtgenomes of 39 mother-child pairs of European decent; haplogroups H, J, K, R, T, U, and X. Both shared and differentiating heteroplasmy were observed on a frequent basis in these closely related maternal relatives, with the minor variant often presented as 2-10% of the sequencing reads. A total of 17 pairs exhibited differentiating heteroplasmy (44%), with the majority of sites (76%, 16 of 21) occurring in the coding region, further illustrating the value of conducting sequence analysis on the entire mtgenome. A number of the sites of differentiating heteroplasmy resulted in non-synonymous changes in protein sequence (5 of 21), and to changes in transfer or ribosomal RNA sequences (5 of 21), highlighting the potentially deleterious nature of these heteroplasmic states. Shared heteroplasmy was observed in 12 of the 39 mother-child pairs (31%), with no duplicate sites of either differentiating or shared heteroplasmy observed; a single nucleotide position (16093) was duplicated between the data sets. Finally, rates of heteroplasmy in blood and buccal cells were compared, as it is known that rates can vary across tissue types, with similar observations in the current study. Our data support the view that differentiating heteroplasmy across the mtgenome can be used to frequently distinguish maternal relatives, and could be of interest to both the medical genetics and forensic communities., Competing Interests: The authors declare no conflict of interest.
- Published
- 2018
- Full Text
- View/download PDF
50. Elevated mitochondrial genome variation after 50 generations of radiation exposure in a wild rodent.
- Author
-
Baker RJ, Dickins B, Wickliffe JK, Khan FAA, Gaschak S, Makova KD, and Phillips CD
- Abstract
Currently, the effects of chronic, continuous low dose environmental irradiation on the mitochondrial genome of resident small mammals are unknown. Using the bank vole ( Myodes glareolus ) as a model system, we tested the hypothesis that approximately 50 generations of exposure to the Chernobyl environment has significantly altered genetic diversity of the mitochondrial genome. Using deep sequencing, we compared mitochondrial genomes from 131 individuals from reference sites with radioactive contamination comparable to that present in northern Ukraine before the 26 April 1986 meltdown, to populations where substantial fallout was deposited following the nuclear accident. Population genetic variables revealed significant differences among populations from contaminated and uncontaminated localities. Therefore, we rejected the null hypothesis of no significant genetic effect from 50 generations of exposure to the environment created by the Chernobyl meltdown. Samples from contaminated localities exhibited significantly higher numbers of haplotypes and polymorphic loci, elevated genetic diversity, and a significantly higher average number of substitutions per site across mitochondrial gene regions. Observed genetic variation was dominated by synonymous mutations, which may indicate a history of purify selection against nonsynonymous or insertion/deletion mutations. These significant differences were not attributable to sample size artifacts. The observed increase in mitochondrial genomic diversity in voles from radioactive sites is consistent with the possibility that chronic, continuous irradiation resulting from the Chernobyl disaster has produced an accelerated mutation rate in this species over the last 25 years. Our results, being the first to demonstrate this phenomenon in a wild mammalian species, are important for understanding genetic consequences of exposure to low-dose radiation sources.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.