Julian M. Hess, Hailei Zhang, Timothy J. Sullivan, Paz Polak, Noam Shoresh, François Aguet, Segrè Av, Dimitri Livitz, Xiao Li, Nicholas J. Haradhvala, Dainel Rosebrock, Gad Getz, Chip Stewart, Keren Yizhak, Eila Arich-Landkof, Jaegil Kim, and Kristin G. Ardlie
Cancer genome studies have significantly contributed to the discovery of somatic mutations and processes that drive cancer. However, how these mutations accumulate in normal cells and contribute to early cancer development remains poorly understood. Recent studies have addressed this question by studying normal blood and a small number of skin samples, discovering both driver mutations as well as mutational processes observed in cancer. Here we extend these studies by analyzing RNA from ~7,000 samples across 30 normal tissues from ~600 individuals compared to their germline DNA, collected as part of the GTEx project. To accomplish this goal we first developed a new pipeline termed RNA-MuTect for calling somatic mutations directly from RNA-seq samples and their matched-normal DNA. RNA-MuTect includes multiple filtering steps designed specifically for analyzing RNA-seq. We first validated RNA-MuTect by analyzing TCGA samples where both DNA and RNA data are available. Comparing the set of mutations detected by RNA-MuTect to those identified in the DNA, we show that whenever there is sufficient coverage to detect the mutations in RNA, we have a high sensitivity. Most importantly, RNA-MuTect has a very low false-positive rate with specificity >90%. We further demonstrate that we can discover most of the known driver genes in this cohort using the mutations detected based on the RNA data. Moreover, using the RNA data, we can detect the same mutational processes as identified in the DNA, including UV, aging, smoking and others. To study clonal expansion in normal tissues and investigate whether known cancer-related genes and processes can be identified, we applied RNA-MuTect to the GTEx dataset. As expected, multiple variants were detected across tissues, with skin, lung and esophagus having the highest number of somatic mutations. Overall, different cancer-related events were detected. Specifically: (1) we found 15 hotspot mutations in 6 different genes including TP53, KRAS and PIK3CA; (2) Mutated genes in 8 different tissues were found to be enriched with Cancer Gene Census genes; (3) Known cancer genes were found to be under positive selection in different tissues; (4) Known cancer-related mutational signatures were captured in normal tissues; (5) Cases of allelic imbalance were detected in various tissues. This study is the first to analyze a large number of samples across many tissues to explore the fundamental question of cancer initiation. Many cancer-related processes and events are discovered across different tissues, laying the foundation for studying the earliest stages of cancer development. Note: This abstract was not presented at the meeting. Citation Format: Keren Yizhak, Jaegil Kim, Francois Aguet, Julian Hess, Hailei Zhang, Eila Arich-Landkof, Noam Shoresh, Ayelet Segre, Chip Stewart, Dainel Rosebrock, Dimitri Livitz, Nicholas Haradhvala, Paz Polak, Tim Sullivan, Xiao Li, Kristin Ardlie, Gad Getz. Identifying cancer-related processes in normal tissues via RNA-seq [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr LB-231. doi:10.1158/1538-7445.AM2017-LB-231