Sofia G Seabra, Pieter J K Libin, Kristof Theys, Anna Zhukova, Barney I Potter, Hanna Nebenzahl-Guimaraes, Alexander E Gorbalenya, Igor A Sidorov, Victor Pimentel, Marta Pingarilho, Ana T R de Vasconcelos, Simon Dellicour, Ricardo Khouri, Olivier Gascuel, Anne-Mieke Vandamme, Guy Baele, Lize Cuypers, Ana B Abecasis, Universidade Nova de Lisboa = NOVA University Lisbon (NOVA), Vrije Universiteit Brussel (VUB), Rega Institute for Medical Research [Leuven, België], Catholic University of Leuven - Katholieke Universiteit Leuven (KU Leuven), Hasselt University (UHasselt), Bioinformatique évolutive - Evolutionary Bioinformatics, Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS), Hub Bioinformatique et Biostatistique - Bioinformatics and Biostatistics HUB, Institut Pasteur [Paris] (IP)-Université Paris Cité (UPCité), Leiden University Medical Center (LUMC), Universiteit Leiden, Lomonosov Moscow State University (MSU), Laboratorio Nacional de Computação Cientifica [Rio de Janeiro] (LNCC / MCT), Université libre de Bruxelles (ULB), Instituto Gonçalo Moniz / Gonçalo Moniz Research Centre - Fiocruz Bahia [Salvador, Brésil] (IGM), Fundação Oswaldo Cruz / Oswaldo Cruz Foundation (FIOCRUZ), Réseau International des Instituts Pasteur (RIIP)-Réseau International des Instituts Pasteur (RIIP), Institut de Systématique, Evolution, Biodiversité (ISYEB ), Muséum national d'Histoire naturelle (MNHN)-École Pratique des Hautes Études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université des Antilles (UA), University Hospitals Leuven [Leuven], This research was supported in part by the European Union’s Horizon 2020 research and innovation program ZIKAlliance (Agreement No 734548) and by Fundação para a Ciência e Tecnologia (FCT) through funds GHTM-UID/04413/2020. S.G.S. was funded by FCT, Portugal, through contrato-programa 1567 (CEECINST/00102/2018). K.T. was supported by a Fonds Wetenschappelijk Onderzoek post-doctoral grant. P.L. was supported by a doctoral (1S31916N) and post-doctoral grant (#1242021N) provided by the Fonds Wetenschappelijk Onderzoek and was also supported by funding from the Flemish Government under the ‘Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen’ programme. P.L. also received funding from the research council of the Vrije Universiteit Brussel (OZR-VUB) via grant number OZR3863BOF. S.D. acknowledges funding from the Fonds de la Recherche Scientifique (FNRS, Belgium). S.D. and G.B. acknowledge support from the Research Foundation—Flanders (Fonds voor Wetenschappelijk Onderzoek—Vlaanderen, G098321N). B.P. and G.B. acknowledge support from the Internal Fondsen KU Leuven/Internal Funds KU Leuven (Grant No. C14/18/094). G.B. also acknowledges support from the Research Foundation—Flanders (‘Fonds voor Wetenschappelijk Onderzoek—Vlaanderen,’ G0E1420N)., European Project: 734548,ZIKAlliance(2016), and Informatics and Applied Informatics
International audience; The Zika virus (ZIKV) disease caused a public health emergency of international concern that started in February 2016. The overall number of ZIKV-related cases increased until November 2016, after which it declined sharply. While the evaluation of the potential risk and impact of future arbovirus epidemics remains challenging, intensified surveillance efforts along with a scale-up of ZIKV whole-genome sequencing provide an opportunity to understand the patterns of genetic diversity, evolution, and spread of ZIKV. However, a classification system that reflects the true extent of ZIKV genetic variation is lacking. Our objective was to characterize ZIKV genetic diversity and phylodynamics, identify genomic footprints of differentiation patterns, and propose a dynamic classification system that reflects its divergence levels. We analysed a curated dataset of 762 publicly available sequences spanning the full-length coding region of ZIKV from across its geographical span and collected between 1947 and 2021. The definition of genetic groups was based on comprehensive evolutionary dynamics analyses, which included recombination and phylogenetic analyses, within- and between-group pairwise genetic distances comparison, detection of selective pressure, and clustering analyses. Evidence for potential recombination events was detected in a few sequences. However, we argue that these events are likely due to sequencing errors as proposed in previous studies. There was evidence of strong purifying selection, widespread across the genome, as also detected for other arboviruses. A total of 50 sites showed evidence of positive selection, and for a few of these sites, there was amino acid (AA) differentiation between genetic clusters. Two main genetic clusters were defined, ZA and ZB, which correspond to the already characterized ‘African’ and ‘Asian’ genotypes, respectively. Within ZB, two subgroups, ZB.1 and ZB.2, represent the Asiatic and the American (and Oceania) lineages, respectively. ZB.1 is further subdivided into ZB.1.0 (a basal Malaysia sequence sampled in the 1960s and a recent Indian sequence), ZB.1.1 (South-Eastern Asia, Southern Asia, and Micronesia sequences), and ZB.1.2 (very similar sequences from the outbreak in Singapore). ZB.2 is subdivided into ZB.2.0 (basal American sequences and the sequences from French Polynesia, the putative origin of South America introduction), ZB.2.1 (Central America), and ZB.2.2 (Caribbean and North America). This classification system does not use geographical references and is flexible to accommodate potential future lineages. It will be a helpful tool for studies that involve analyses of ZIKV genomic variation and its association with pathogenicity and serve as a starting point for the public health surveillance and response to on-going and future epidemics and to outbreaks that lead to the emergence of new variants.The Zika virus (ZIKV) disease caused a public health emergency of international concern that started in February 2016. The overall number of ZIKV-related cases increased until November 2016, after which it declined sharply. While the evaluation of the potential risk and impact of future arbovirus epidemics remains challenging, intensified surveillance efforts along with a scale-up of ZIKV whole-genome sequencing provide an opportunity to understand the patterns of genetic diversity, evolution, and spread of ZIKV. However, a classification system that reflects the true extent of ZIKV genetic variation is lacking. Our objective was to characterize ZIKV genetic diversity and phylodynamics, identify genomic footprints of differentiation patterns, and propose a dynamic classification system that reflects its divergence levels. We analysed a curated dataset of 762 publicly available sequences spanning the full-length coding region of ZIKV from across its geographical span and collected between 1947 and 2021. The definition of genetic groups was based on comprehensive evolutionary dynamics analyses, which included recombination and phylogenetic analyses, within- and between-group pairwise genetic distances comparison, detection of selective pressure, and clustering analyses. Evidence for potential recombination events was detected in a few sequences. However, we argue that these events are likely due to sequencing errors as proposed in previous studies. There was evidence of strong purifying selection, widespread across the genome, as also detected for other arboviruses. A total of 50 sites showed evidence of positive selection, and for a few of these sites, there was amino acid (AA) differentiation between genetic clusters. Two main genetic clusters were defined, ZA and ZB, which correspond to the already characterized ‘African’ and ‘Asian’ genotypes, respectively. Within ZB, two subgroups, ZB.1 and ZB.2, represent the Asiatic and the American (and Oceania) lineages, respectively. ZB.1 is further subdivided into ZB.1.0 (a basal Malaysia sequence sampled in the 1960s and a recent Indian sequence), ZB.1.1 (South-Eastern Asia, Southern Asia, and Micronesia sequences), and ZB.1.2 (very similar sequences from the outbreak in Singapore). ZB.2 is subdivided into ZB.2.0 (basal American sequences and the sequences from French Polynesia, the putative origin of South America introduction), ZB.2.1 (Central America), and ZB.2.2 (Caribbean and North America). This classification system does not use geographical references and is flexible to accommodate potential future lineages. It will be a helpful tool for studies that involve analyses of ZIKV genomic variation and its association with pathogenicity and serve as a starting point for the public health surveillance and response to on-going and future epidemics and to outbreaks that lead to the emergence of new variants.