1. Efficient and effective control of confounding in eQTL mapping studies through joint differential expression and Mendelian randomization analyses
- Author
-
Huanhuan Zhu, Qinke Peng, Xiang Zhou, Yue Fan, and Yanyi Song
- Subjects
Statistics and Probability ,Computer science ,Quantitative Trait Loci ,Gene Expression ,Genome-wide association study ,Computational biology ,Quantitative trait locus ,Polymorphism, Single Nucleotide ,Biochemistry ,Correlation ,03 medical and health sciences ,0302 clinical medicine ,Gene expression ,Mendelian randomization ,Molecular Biology ,Gene ,030304 developmental biology ,0303 health sciences ,Gene Expression Profiling ,Confounding ,Mendelian Randomization Analysis ,Original Papers ,Expression (mathematics) ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Expression quantitative trait loci ,Software ,030217 neurology & neurosurgery ,Genome-Wide Association Study - Abstract
Motivation Identifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today. Results Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors. Availabilityand implementation Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
- Full Text
- View/download PDF