1. ABEILLE: a novel method for ABerrant Expression Identification empLoying machine LEarning from RNA-sequencing data.
- Author
-
Labory, Justine, Bideau, Gwendal Le, Pratella, David, Yao, Jean-Elisée, Saadi, Samira Ait-El-Mkadem, Bannwarth, Sylvie, El-Hami, Loubna, Paquis-Fluckinger, Véronique, and Bottini, Silvia
- Subjects
MACHINE learning ,INTERNET servers ,RNA sequencing ,GENE expression ,DECISION trees ,SOURCE code - Abstract
Motivation Current advances in omics technologies are paving the diagnosis of rare diseases proposing a complementary assay to identify the responsible gene. The use of transcriptomic data to identify aberrant gene expression (AGE) has demonstrated to yield potential pathogenic events. However, popular approaches for AGE identification are limited by the use of statistical tests that imply the choice of arbitrary cut-off for significance assessment and the availability of several replicates not always possible in clinical contexts. Results Hence, we developed ABerrant Expression Identification empLoying machine LEarning from sequencing data (ABEILLE) a variational autoencoder (VAE)-based method for the identification of AGEs from the analysis of RNA-seq data without the need for replicates or a control group. ABEILLE combines the use of a VAE, able to model any data without specific assumptions on their distribution, and a decision tree to classify genes as AGE or non-AGE. An anomaly score is associated with each gene in order to stratify AGE by the severity of aberration. We tested ABEILLE on a semi-synthetic and an experimental dataset demonstrating the importance of the flexibility of the VAE configuration to identify potential pathogenic candidates. Availability and implementation ABEILLE source code is freely available at: https://github.com/UCA-MSI/ABEILLE. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF