Back to Search
Start Over
Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest
- Source :
- Molecular Ecology Resources, Molecular Ecology Resources, 2021, 21 (8), pp.2598-2613. ⟨10.1111/1755-0998.13413⟩, Molecular Ecology Resources, Wiley/Blackwell, In press, ⟨10.1111/1755-0998.13413⟩
- Publication Year :
- 2021
-
Abstract
- International audience; Simulation-based methods such as Approximate Bayesian Computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. RF allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated datasets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real datasets corresponding to pool-sequencing and individual-sequencing SNP datasets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP datasets to make inferences about complex population genetic histories.
- Subjects :
- 0106 biological sciences
0301 basic medicine
[SDV]Life Sciences [q-bio]
Feature vector
Population
SNP
Context (language use)
Biology
Machine learning
computer.software_genre
010603 evolutionary biology
01 natural sciences
Polymorphism, Single Nucleotide
Set (abstract data type)
03 medical and health sciences
approximate Bayesian computation
Genetics
Computer Simulation
supervised machine learning
education
Ecology, Evolution, Behavior and Systematics
Selection (genetic algorithm)
Demography
education.field_of_study
Estimation theory
business.industry
Special Issue
RESOURCE ARTICLES
pool‐sequencing
0402 animal and dairy science
population genetics
Bayes Theorem
04 agricultural and veterinary sciences
demographic history
040201 dairy & animal science
Molecular and Statistical Advances
Random forest
030104 developmental biology
Genetics, Population
model or scenario selection
Artificial intelligence
Approximate Bayesian computation
business
parameter estimation
computer
Algorithms
random forest
Biotechnology
Subjects
Details
- ISSN :
- 17550998 and 1755098X
- Volume :
- 21
- Issue :
- 8
- Database :
- OpenAIRE
- Journal :
- Molecular ecology resources
- Accession number :
- edsair.doi.dedup.....3378596a26ec6a79c81585a179c94dfa