Back to Search Start Over

Legacy Data Confound Genomics Studies

Authors :
Changhoon Kim
Koichiro Higasa
Yoichiro Kamatani
Luke Anderson-Trocmé
Mathieu Bourgey
Jeong-Sun Seo
Simon Gravel
Fumihiko Matsuda
Rick Farouni
Source :
Molecular Biology and Evolution. 37:2-10
Publication Year :
2019
Publisher :
Oxford University Press (OUP), 2019.

Abstract

Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.

Details

ISSN :
15371719 and 07374038
Volume :
37
Database :
OpenAIRE
Journal :
Molecular Biology and Evolution
Accession number :
edsair.doi.dedup.....fa6567cc105edb3f7109814fc5319d1d
Full Text :
https://doi.org/10.1093/molbev/msz201