1. pong: fast analysis and visualization of latent clusters in population genetic data
- Author
-
Aaron A. Behr, Katherine Z. Liu, Sohini Ramachandran, Gracie Liu-Fang, and Priyanka Nakka
- Subjects
0106 biological sciences ,0301 basic medicine ,Statistics and Probability ,Theoretical computer science ,Protein Conformation ,Computer science ,Population ,Inference ,computer.software_genre ,010603 evolutionary biology ,01 natural sciences ,Biochemistry ,Latent Dirichlet allocation ,03 medical and health sciences ,symbols.namesake ,Text mining ,Population Groups ,Computer Graphics ,Cluster Analysis ,Data Mining ,Humans ,education ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,education.field_of_study ,business.industry ,Genetics and Population Analysis ,Original Papers ,Pipeline (software) ,Computer Science Applications ,Visualization ,Computational Mathematics ,Genetics, Population ,030104 developmental biology ,Computational Theory and Mathematics ,symbols ,Programming Languages ,Data mining ,business ,computer ,Algorithms ,Software - Abstract
1MotivationA series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining.2ResultsWe introducepong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native D3.js interactive visualization.pongleverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared to other methods that process output from mixed-membership models. We applypongto 225,705 unlinked genome-wide single-nucleotide variants from 2,426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show thatpongoutpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools.3Availabilitypongis freely available and can be installed using the Python package management systempip.pong’s source code is available athttps://github.com/abehr/pong.4Contactaaron_behr@alumni.brown.edu,sramachandran@brown.edu
- Published
- 2016
- Full Text
- View/download PDF