Back to Search
Start Over
Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants
- Source :
- Systematic Biology. :syw092
- Publication Year :
- 2016
- Publisher :
- Oxford University Press (OUP), 2016.
-
Abstract
- Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.].
- Subjects :
- 0106 biological sciences
0301 basic medicine
Base Sequence
Phylogenetic tree
biology
Sequence Analysis, DNA
Missing data
biology.organism_classification
Models, Biological
010603 evolutionary biology
01 natural sciences
Magnoliopsida
03 medical and health sciences
Tree (data structure)
030104 developmental biology
Taxon
Viburnum
Phylogenetics
Evolutionary biology
Genetics
Computer Simulation
Adoxaceae
Clade
Phylogeny
Ecology, Evolution, Behavior and Systematics
Subjects
Details
- ISSN :
- 1076836X and 10635157
- Database :
- OpenAIRE
- Journal :
- Systematic Biology
- Accession number :
- edsair.doi.dedup.....1d61c762760b5e6da59cffb17839268c
- Full Text :
- https://doi.org/10.1093/sysbio/syw092