1. FexRNA: Exploratory Data Analysis and Feature Selection of Non-Coding RNA
- Author
-
Yi-Ping Phoebe Chen, Annette McGrath, and Noorul Amin
- Subjects
RNA, Untranslated ,Sequence Analysis, RNA ,Computer science ,Applied Mathematics ,Feature extraction ,Univariate ,FASTA format ,Computational Biology ,Feature selection ,computer.software_genre ,Machine Learning ,Set (abstract data type) ,Exploratory data analysis ,Identification (information) ,Genetics ,Data mining ,Databases, Nucleic Acid ,computer ,Algorithms ,Software ,Selection (genetic algorithm) ,Biotechnology - Abstract
Non-coding RNA (ncRNA) is involved in many biological processes and diseases in all species. Many ncRNA datasets exist that provide a sequential representation of data that best suits biomedical purposes. However, for ncRNA identification and analysis, statistical learning methods require hidden numerical features from the data. The extraction of hidden features, their analysis, and usage of a suitable set of features is crucial towards any statistical learning methods performance. Furthermore, a wealth of sequence intrinsic features has been proposed for ncRNA identification. Therefore, a systematic review and selection of these features are warranted. First, fasta format sequence datasets are generated from RNACentral representing many ncRNA types across a number of species. Next, a features dataset is created per fasta dataset consisting of 17 most frequently reported sequence intrinsic features. The features dataset is available from the FexRNA platform developed as part of this work. In addition, the features datasets are explored and analysed in terms of statistical information, univariate and bivariate analysis. For the feature selection (FS), a two-fold hierarchal FS framework based on majority voting and correlation is proposed and evaluated. Therefore, the FexRNA platform provides a useful platform for information about ncRNA features datasets, features analysis, and selection.
- Published
- 2021
- Full Text
- View/download PDF