Flora Jay, Willy Rodríguez, Frédéric Austerlitz, Simon Boitard, Stefano Mona, Génétique Animale et Biologie Intégrative (GABI), Institut National de la Recherche Agronomique (INRA)-AgroParisTech, Biologie Intégrative des Populations, École pratique des hautes études (EPHE)-Centre National de la Recherche Scientifique (CNRS), Institut de Mathématiques de Toulouse UMR5219 (IMT), Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), Eco-Anthropologie et Ethnobiologie (EAE), Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7)-Muséum national d'Histoire naturelle (MNHN), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Génétique Physiologie et Systèmes d'Elevage (GenPhySE ), École nationale supérieure agronomique de Toulouse [ENSAT]-Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées, Institut de Systématique, Evolution, Biodiversité (ISYEB ), Muséum national d'Histoire naturelle (MNHN)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-École pratique des hautes études (EPHE), Institut National de la Recherche Agronomique (INRA)-Ecole Nationale Vétérinaire de Toulouse (ENVT), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Université de Toulouse (UT)-École nationale supérieure agronomique de Toulouse (ENSAT), Université de Toulouse (UT)-Université de Toulouse (UT), Muséum national d'Histoire naturelle (MNHN)-Université Pierre et Marie Curie - Paris 6 (UPMC)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS), Muséum national d'Histoire naturelle (MNHN)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS), Muséum national d'Histoire naturelle (MNHN)-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Muséum national d'Histoire naturelle (MNHN)-Université Paris Diderot - Paris 7 (UPD7), AgroParisTech-Institut National de la Recherche Agronomique (INRA), Agence Nationale de la Recherche [ANR-10-GENM-0014, ANR-12-BSV7-0012], PEPS Bio-Maths-Info from Universite de Toulouse, Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-École nationale supérieure agronomique de Toulouse [ENSAT], Muséum national d'Histoire naturelle (MNHN)-Université Pierre et Marie Curie - Paris 6 (UPMC)-École pratique des hautes études (EPHE)-Centre National de la Recherche Scientifique (CNRS), and Boitard, Simon
Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles., Author Summary Molecular data sampled from extant individuals contains considerable information about their demographic history. In particular, one classical question in population genetics is to reconstruct past population size changes from such data. Relating these changes to various climatic, geological or anthropogenic events allows characterizing the main factors driving genetic diversity and can have major outcomes for conservation. Until recently, mostly very simple histories, including one or two population size changes, could be estimated from genetic data. This has changed with the sequencing of entire genomes in many species, and several methods allow now inferring complex histories consisting of several tens of population size changes. However, analyzing entire genomes, while accounting for recombination, remains a statistical and numerical challenge. These methods, therefore, can only be applied to small samples with a few diploid genomes. We overcome this limitation by using an approximate estimation approach, where observed genomes are summarized using a small number of statistics related to allele frequencies and linkage disequilibrium. In contrast to previous approaches, we show that our method allows us to reconstruct also the most recent part (the last 100 generations) of the population size history. As an illustration, we apply it to large samples of whole-genome sequences in four cattle breeds.