Back to Search
Start Over
Unravelling reference bias in ancient DNA datasets.
- Source :
-
Bioinformatics (Oxford, England) [Bioinformatics] 2024 Jul 01; Vol. 40 (7). - Publication Year :
- 2024
-
Abstract
- Motivation: The alignment of sequencing reads is a critical step in the characterization of ancient genomes. However, reference bias and spurious mappings pose a significant challenge, particularly as cutting-edge wet lab methods generate datasets that push the boundaries of alignment tools. Reference bias occurs when reference alleles are favoured over alternative alleles during mapping, whereas spurious mappings stem from either contamination or when endogenous reads fail to align to their correct position. Previous work has shown that these phenomena are correlated with read length but a more thorough investigation of reference bias and spurious mappings for ancient DNA has been lacking. Here, we use a range of empirical and simulated palaeogenomic datasets to investigate the impacts of mapping tools, quality thresholds, and reference genome on mismatch rates across read lengths.<br />Results: For these analyses, we introduce AMBER, a new bioinformatics tool for assessing the quality of ancient DNA mapping directly from BAM-files and informing on reference bias, read length cut-offs and reference selection. AMBER rapidly and simultaneously computes the sequence read mapping bias in the form of the mismatch rates per read length, cytosine deamination profiles at both CpG and non-CpG sites, fragment length distributions, and genomic breadth and depth of coverage. Using AMBER, we find that mapping algorithms and quality threshold choices dictate reference bias and rates of spurious alignment at different read lengths in a predictable manner, suggesting that optimized mapping parameters for each read length will be a key step in alleviating reference bias and spurious mappings.<br />Availability and Implementation: AMBER is available for noncommercial use on GitHub (https://github.com/tvandervalk/AMBER.git). Scripts used to generate and analyse simulated datasets are available on Github (https://github.com/sdolenz/refbias&#95;scripts).<br /> (© The Author(s) 2024. Published by Oxford University Press.)
Details
- Language :
- English
- ISSN :
- 1367-4811
- Volume :
- 40
- Issue :
- 7
- Database :
- MEDLINE
- Journal :
- Bioinformatics (Oxford, England)
- Publication Type :
- Academic Journal
- Accession number :
- 38960861
- Full Text :
- https://doi.org/10.1093/bioinformatics/btae436