1. No major flaws in 'Identification of individuals by trait prediction using whole-genome sequencing data'
- Author
-
Christoph Lippert, Riccardo Sabatini, M. Cyrus Maher, Eun Yong Kang, Seunghak Lee, Okan Arikan, Alena Harley, Axel Bernal, Peter Garst, Victor Lavrenko, Ken Yocum, Theodore M. Wong, Mingfu Zhu, Wen-Yun Yang, Chris Chang, Barry Hicks, Smriti Ramakrishnan, Haibao Tang, Chao Xie, Suzanne Brewerton, Yaron Turpaz, Amalio Telenti, Rhonda K. Roby, Franz Och, and J. Craig Venter
- Subjects
Genetics ,Whole genome sequencing ,business.industry ,Genomics ,Sample (statistics) ,Biology ,Machine learning ,computer.software_genre ,Genome ,Range (mathematics) ,Identification (information) ,Cardiovascular and Metabolic Diseases ,Trait ,Identifiability ,Artificial intelligence ,business ,computer - Abstract
In a recently published PNAS article, we studied the identifiability of genomic samples using machine learning methods [Lippert et al., 2017]. In a response, Erlich [2017] argued that our work contained major flaws. The main technical critique of Erlich [2017] builds on a simulation experiment that shows that our proposed algorithm, which uses only a genomic sample for identification, performed no better than a strategy that uses demographic variables. Below, we show why this comparison is misleading and provide a detailed discussion of the key critical points in our analyses that have been brought up in Erlich [2017] and in the media. Further, not only faces may be derived from DNA, but a wide range of phenotypes and demographic variables. In this light, the main contribution of Lippert et al. [2017] is an algorithm that identifies genomes of individuals by combining multiple DNA-based predictive models for a myriad of traits.
- Published
- 2017