1. Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease.
- Author
-
Abood, Abdullah, Mesner, Larry D., Jeffery, Erin D., Murali, Mayank, Lehe, Micah D., Saquing, Jamie, Farber, Charles R., and Sheynkman, Gloria M.
- Subjects
- *
ALTERNATIVE RNA splicing , *LOCUS (Genetics) , *GENOME-wide association studies , *BONE density , *RNA sequencing - Abstract
A major fraction of loci identified by genome-wide association studies (GWASs) mediate alternative splicing, but mechanistic interpretation is hindered by the technical limitations of short-read RNA sequencing (RNA-seq), which cannot directly link splicing events to full-length protein isoforms. Long-read RNA-seq represents a powerful tool to characterize transcript isoforms, and recently, infer protein isoform existence. Here, we present an approach that integrates information from GWASs, splicing quantitative trait loci (sQTLs), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes that colocalized with BMD associations (H4PP ≥ 0.75). We generated PacBio Iso-Seq data (N = ∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were unannotated. By casting the sQTLs onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense-mediated decay and 190 that potentially resulted in the expression of unannotated protein isoforms. Finally, we functionally validated colocalizing sQTLs in TPM2 , in which siRNA-mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization but exhibited no effect upon knockdown of the entire gene. Our approach should be to generalize across diverse clinical traits and to provide insights into protein isoform activities modulated by GWAS loci. Many GWAS loci are associated with alternative splicing, but the identities and functions of most protein isoform effectors are unknown. We demonstrate how the integration of splicing QTLs (sQTLs) and PacBio long-read RNA-seq data enables the prediction, characterization, and functional prioritization of protein isoforms associated with complex human disease. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF