1. A molecular barcode and online tool to identify and map imported infection with Plasmodium vivax
- Author
-
Nguyen Hoang Chau, François Nosten, Alberto Tobón-Castaño, Alistair Miles, Ric N. Price, Alyssa E. Barry, Nicholas J. White, Matthew J. Grigg, Ishag Adam, Lidia Madeline Montenegro, Yaobao Liu, Bridget E. Barber, Marcelo U. Ferreira, A G Rahim, Leily Trianty, Hidayat Trimarsanto, Rintis Noviyanti, Tatiana M. Lopera-Mesa, Kamala Thriemer, Dominic P. Kwiatkowski, Sisay Getachew, Kanlaya Sriprawat, Ashenafi Assefa, Sónia Gonçalves, Ivo Mueller, Wasif A. Khan, Abraham Aseffa, E Sutanto, Jutta Marfurt, Victoria Simpson, Roberto Amato, Sarah Auburn, Mohammad Shafiul Alam, Yaghoob Hamedi, Zuleima Pava, Olivo Miotto, Richard D. Pearson, S Wangchuck, Timothy William, Tran Tinh Hien, Benedikt Ley, Qi Gao, Nicholas M. Anstey, Diego F. Echeverry, Eleanor Drury, and Beyene Petros
- Subjects
0303 health sciences ,education.field_of_study ,biology ,030231 tropical medicine ,Plasmodium vivax ,Population ,Decision tree ,Single-nucleotide polymorphism ,Computational biology ,biology.organism_classification ,Missing data ,Barcode ,Matthews correlation coefficient ,3. Good health ,law.invention ,03 medical and health sciences ,0302 clinical medicine ,law ,education ,Genotyping ,030304 developmental biology - Abstract
Imported cases present a considerable challenge to the elimination of malaria. Traditionally, patient travel history has been used to identify imported cases, but the long-latency liver stages confound this approach in Plasmodium vivax. Molecular tools to identify and map imported cases offer a more robust approach, that can be combined with drug resistance and other surveillance markers in high-throughput, population-based genotyping frameworks. Using a machine learning approach incorporating hierarchical FST (HFST) and decision tree (DT) analysis applied to 831 P. vivax genomes from 20 countries, we identified a 28-Single Nucleotide Polymorphism (SNP) barcode with high capacity to predict the country of origin. The Matthews correlation coefficient (MCC), which provides a measure of the quality of the classifications, ranging from −1 (total disagreement) to 1 (perfect prediction), exceeded 0.9 in 15 countries in cross-validation evaluations. When combined with an existing 37-SNP P. vivax barcode, the 65-SNP panel exhibits MCC scores exceeding 0.9 in 17 countries with up to 30% missing data. As a secondary objective, several genes were identified with moderate MCC scores (median MCC range from 0.54-0.68), amenable as markers for rapid testing using low-throughput genotyping approaches. A likelihood-based classifier framework was established, that supports analysis of missing data and polyclonal infections. To facilitate investigator-lead analyses, the likelihood framework is provided as a web-based, open-access platform (vivaxGEN-geo) to support the analysis and interpretation of data produced either at the 28-SNP core or full 65-SNP barcode. These tools can be used by malaria control programs to identify the main reservoirs of infection so that resources can be focused to where they are needed most.
- Published
- 2019
- Full Text
- View/download PDF