1. Computing the Probability of RNA Hairpin and Multiloop Formation.
- Author
-
Ding, Yang, Lorenz, William A., Dotu, Ivan, Senter, Evan, and Clote, Peter
- Subjects
- *
HAIRPIN (Genetics) , *MOLECULAR structure of nucleic acids , *GENETIC transcription , *BOLTZMANN'S equation , *SUPPORT vector machines , *FAST Fourier transforms - Abstract
We describe four novel algorithms, , and , which compute the Boltzmann partition function for global structural constraints-respectively for the number of hairpins, the number of multiloops, maximum order (or depth) of multiloops, and the simultaneous number of hairpins and multiloops. Given an RNA sequence of length n and a user-specified integer 0 ≤ K ≤ n, (resp. and ) computes the partition functions Z( k) for each 0 ≤ k ≤ K in time O( K2 n3) and space O( Kn2), while computes the partition functions Z( m, h) for 0 ≤ mm ≤ M multiloops and 0 ≤ h ≤ H hairpins, with run time O( M2 H2 n3) and space O( MHn2). In addition, programs such as (resp. ) sample from the low-energy ensemble of structures having h hairpins (resp. m multiloops and h hairpins), for given h, m. Moreover, by using the fast Fourier transform (FFT), and have been improved to run in time O( n4) and space O( n2), although this improvement is not possible for . We present two applications of the novel algorithms. First, we show that for many Rfam families of RNA, structures sampled from are more accurate than the minimum free-energy structure; for instance, sensitivity improves by almost 24% for transfer RNA, while for certain ribozyme families, there is an improvement of around 5%. Second, we show that the probabilities p( k)= Z( k) /Z of forming k hairpins (resp. multiloops) provide discriminating novel features for a support vector machine or relevance vector machine binary classifier for Rfam families of RNA. Our data suggests that multiloop order does not provide any significant discriminatory power over that of hairpin and multiloop number, and since these probabilities can be efficiently computed using the FFT, hairpin and multiloop formation probabilities could be added to other features in existent noncoding RNA gene finders. Our programs, written in C/C++, are publicly available online at: . [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF