1. Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs
- Author
-
Morihiro Hayashida, Jaewook Hwang, Tatsuya Akutsu, Yang Zhao, and Yue Cao
- Subjects
Theoretical computer science ,Computer science ,media_common.quotation_subject ,Bisection-type tree grammar ,Biochemistry ,Rule-based machine translation ,Polysaccharides ,Structural Biology ,Humans ,Molecular Biology ,Integer programming ,media_common ,Biological data ,Basis (linear algebra) ,Grammar ,Applied Mathematics ,Computational Biology ,Data Compression ,Grammar-based compression ,Glycan ,RNA secondary structure ,Computer Science Applications ,Tree (data structure) ,Tree structure ,RNA ,Algorithms ,Research Article - Abstract
Background Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees. Results Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs. Conclusions We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0558-4) contains supplementary material, which is available to authorized users.
- Published
- 2015