Back to Search Start Over

DeepGene: an advanced cancer type classifier based on deep learning and somatic point mutations.

Authors :
Yuchen Yuan
Yi Shi
Changyang Li
Jinman Kim
Weidong Cai
Zeguang Han
Dagan Feng, David
Source :
BMC Bioinformatics. 2016 Suppl 17, Vol. 17, p243-256. 14p. 2 Diagrams, 7 Charts, 6 Graphs.
Publication Year :
2016

Abstract

Background: With the developments of DNA sequencing technology, large amounts of sequencing data have become available in recent years and provide unprecedented opportunities for advanced association studies between somatic point mutations and cancer types/subtypes, which may contribute to more accurate somatic point mutation based cancer classification (SMCC). However in existing SMCC methods, issues like high data sparsity, small volume of sample size, and the application of simple linear classifiers, are major obstacles in improving the classification performance. Results: To address the obstacles in existing SMCC studies, we propose DeepGene, an advanced deep neural network (DNN) based classifier, that consists of three steps: firstly, the clustered gene filtering (CGF) concentrates the gene data by mutation occurrence frequency, filtering out the majority of irrelevant genes; secondly, the indexed sparsity reduction (ISR) converts the gene data into indexes of its non-zero elements, thereby significantly suppressing the impact of data sparsity; finally, the data after CGF and ISR is fed into a DNN classifier, which extracts high-level features for accurate classification. Experimental results on our curated TCGA-DeepGene dataset, which is a reformulated subset of the TCGA dataset containing 12 selected types of cancer, show that CGF, ISR and DNN all contribute in improving the overall classification performance. We further compare DeepGene with three widely adopted classifiers and demonstrate that DeepGene has at least 24% performance improvement in terms of testing accuracy. Conclusions: Based on deep learning and somatic point mutation data, we devise DeepGene, an advanced cancer type classifier, which addresses the obstacles in existing SMCC studies. Experiments indicate that DeepGene outperforms three widely adopted existing classifiers, which is mainly attributed to its deep learning module that is able to extract the high level features between combinatorial somatic point mutations and cancer types. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14712105
Volume :
17
Database :
Academic Search Index
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
128029930
Full Text :
https://doi.org/10.1186/s12859-016-1334-9