1. Insights into protein-RNA complexes from computational analyses of iCLIP experiments
- Author
-
Haberman, N., Ule, J., and Luscombe, N.
- Subjects
612.8 - Abstract
RNA-binding proteins (RBPs) are the primary regulators of all aspects of post-transcriptional gene regulation. In order to understand how RBPs perform their function, it is important to identify their binding sites. Recently, new techniques have been developed to employ high-throughput sequencing to study protein-RNA interactions in vivo, including the individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP). iCLIP identifies sites of protein-RNA crosslinking with nucleotide resolution in a transcriptome-wide manner. It is composed of over 60 steps, which can be modified, but it is not clear how variations in the method affect the assignment of RNA binding sites. This is even more pertinent given that several variants of iCLIP have been developed. A central question of my research is how to correctly assign binding sites to RBPs using the data produced by iCLIP and similar techniques. I first focused on the technical analyses and solutions for the iCLIP method. I examined cDNA deletions and crosslink-associated motifs to show that the starts of cDNAs are appropriate to assign the crosslink sites in all variants of CLIP, including iCLIP, eCLIP and irCLIP. I also showed that the non-coinciding cDNA-starts are caused by technical conditions in the iCLIP protocol that may lead to sequence constraints at cDNA-ends in the final cDNA library. I also demonstrated the importance of fully optimizing the RNase and purification conditions in iCLIP to avoid these cDNA-end constraints. Next, I developed CLIPo, a computational framework that assesses various features of iCLIP data to provide quality control standards which reveals how technical variations between experiments affect the specificity of assigned binding sites. I used CLIPo to compare multiple PTBP1 experiments produced by iCLIP, eCLIP and irCLIP, to reveal major effects of sequence constraints at cDNA-ends or starts, cDNA length distribution and non-specific contaminants. Moreover, I assessed how the variations between these methods influence the mechanistic conclusions. Thus, CLIPo presents the quality control standards for transcriptome-wide assignment of protein-RNA binding sites. I continued with analyses of RBP complexes by using data from spliceosome iCLIP. This method simultaneously detects crosslink sites of small nuclear ribonucleo proteins (snRNPs) and auxiliary splicing factors on pre-mRNAs. I demonstrated that the high resolution of spliceosome-iCLIP allows for distinction between multiple proximal RNA binding sites, which can be valuable for transcriptome-wide studies of large ribonucleo protein complexes. Moreover, I showed that spliceosome-iCLIP can experimentally identify over 50,000 human branch points. In summary, I detected technical biases from iCLIP data, and demonstrated how such biases can be avoided, so that cDNA-starts appropriately assign the RNA binding sites. CLIPo analysis proved a useful quality control tool that evaluates data specificity across different methods, and I applied it to iCLIP, irCLIP and ENCODE eCLIP datasets. I presented how spliceosome-iCLIP data can be used to study the splicing machinery on pre-mRNAs and how to use constrained cDNAs from spliceosome-iCLIP data to identify branch points on a genome-wide scale. Taken together, these studies provide new insights into the field of RNA biology and can be used for future studies of iCLIP and related methods.
- Published
- 2017