Yang, Meng, Zhang, Chuwen, Wang, Xiaoji, Liu, Xingmin, Li, Shisen, Huang, Jianye, Feng, Zhimin, Sun, Xiaohui, Chen, Fang, Yang, Shuang, Ni, Ming, Li, Lin, Cao, Yanan, Mu, Feng, Yang, Meng, Zhang, Chuwen, Wang, Xiaoji, Liu, Xingmin, Li, Shisen, Huang, Jianye, Feng, Zhimin, Sun, Xiaohui, Chen, Fang, Yang, Shuang, Ni, Ming, Li, Lin, Cao, Yanan, and Mu, Feng
The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but have not been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project., The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but have not been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project.