1. Artificial Intelligence Technique for Gene Expression by Tumor RNA-Seq Data: A Novel Optimized Deep Learning Approach
- Author
-
Nour Eldeen M. Khalifa, Mohamed Hamed N. Taha, Dalia Ezzat Ali, Adam Slowik, and Aboul Ella Hassanien
- Subjects
Cancer ,RNA sequence ,deep convolutional neural network ,gene expression data ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Cancer is one of the most feared and aggressive diseases in the world and is responsible for more than 9 million deaths universally. Staging cancer early increases the chances of recovery. One staging technique is RNA sequence analysis. Recent advances in the efficiency and accuracy of artificial intelligence techniques and optimization algorithms have facilitated the analysis of human genomics. This paper introduces a novel optimized deep learning approach based on binary particle swarm optimization with decision tree (BPSO-DT) and convolutional neural network (CNN) to classify different types of cancer based on tumor RNA sequence (RNA-Seq) gene expression data. The cancer types that will be investigated in this research are kidney renal clear cell carcinoma (KIRC), breast invasive carcinoma (BRCA), lung squamous cell carcinoma (LUSC), lung adenocarcinoma (LUAD) and uterine corpus endometrial carcinoma (UCEC). The proposed approach consists of three phases. The first phase is preprocessing, which at first optimize the high-dimensional RNA-seq to select only optimal features using BPSO-DT and then, converts the optimized RNA-Seq to 2D images. The second phase is augmentation, which increases the original dataset of 2086 samples to be 5 times larger. The selection of the augmentations techniques was based achieving the least impact on manipulating the features of the images. This phase helps to overcome the overfitting problem and trains the model to achieve better accuracy. The third phase is deep CNN architecture. In this phase, an architecture of two main convolutional layers for featured extraction and two fully connected layers is introduced to classify the 5 different types of cancer according to the availability of images on the dataset. The results and the performance metrics such as recall, precision and F1 score show that the proposed approach achieved an overall testing accuracy of 96.90%. The comparative results are introduced, and the proposed method outperforms those in related works in terms of testing accuracy for 5 classes of cancer. Moreover, the proposed approach is less complex and consume less memory.
- Published
- 2020
- Full Text
- View/download PDF