Back to Search
Start Over
Network-based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis
- Source :
- PLoS Computational Biology, Vol 11, Iss 12, p e1004465 (2015), PLoS computational biology, vol 11, iss 12, PLoS Computational Biology
- Publication Year :
- 2014
-
Abstract
- High-throughput mRNA sequencing (RNA-Seq) is widely used for transcript quantification of gene isoforms. Since RNA-Seq data alone is often not sufficient to accurately identify the read origins from the isoforms for quantification, we propose to explore protein domain-domain interactions as prior knowledge for integrative analysis with RNA-Seq data. We introduce a Network-based method for RNA-Seq-based Transcript Quantification (Net-RSTQ) to integrate protein domain-domain interaction network with short read alignments for transcript abundance estimation. Based on our observation that the abundances of the neighboring isoforms by domain-domain interactions in the network are positively correlated, Net-RSTQ models the expression of the neighboring transcripts as Dirichlet priors on the likelihood of the observed read alignments against the transcripts in one gene. The transcript abundances of all the genes are then jointly estimated with alternating optimization of multiple EM problems. In simulation Net-RSTQ effectively improved isoform transcript quantifications when isoform co-expressions correlate with their interactions. qRT-PCR results on 25 multi-isoform genes in a stem cell line, an ovarian cancer cell line, and a breast cancer cell line also showed that Net-RSTQ estimated more consistent isoform proportions with RNA-Seq data. In the experiments on the RNA-Seq data in The Cancer Genome Atlas (TCGA), the transcript abundances estimated by Net-RSTQ are more informative for patient sample classification of ovarian cancer, breast cancer and lung cancer. All experimental results collectively support that Net-RSTQ is a promising approach for isoform quantification. Net-RSTQ toolbox is available at http://compbio.cs.umn.edu/Net-RSTQ/.<br />Author Summary New sequencing technologies for transcriptome-wide profiling of RNAs have greatly promoted the interest in isoform-based functional characterizations of a cellular system. Elucidation of gene expressions at the isoform resolution could lead to new molecular mechanisms such as gene-regulations and alternative splicings, and potentially better molecular signals for phenotype predictions. However, it could be overly optimistic to derive the proportion of the isoforms of a gene solely based on short read alignments. Inherently, systematical sampling biases from RNA library preparation and ambiguity of read origins in overlapping isoforms pose a problem in reliability. The work in this paper exams the possibility of using protein domain-domain interactions as prior knowledge in isoform transcript quantification. We first made the observation that protein domain-domain interactions positively correlate with isoform co-expressions in TCGA data and then designed a probabilistic EM approach to integrate domain-domain interactions with short read alignments for estimation of isoform proportions. Validated by qRT-PCR experiments on three cell lines, simulations and classifications of TCGA patient samples in several cancer types, Net-RSTQ is proven a useful tool for isoform-based analysis in functional genomes and systems biology.
- Subjects :
- FOS: Computer and information sciences
RNA-Seq
Mathematical Sciences
Machine Learning (cs.LG)
Computational Engineering, Finance, and Science (cs.CE)
Transcriptome
0302 clinical medicine
Neoplasms
Gene expression
Protein Interaction Mapping
Protein Isoforms
2.1 Biological and endogenous factors
RNA, Neoplasm
Aetiology
Biology (General)
Computer Science - Computational Engineering, Finance, and Science
Cancer
Genetics
0303 health sciences
Ecology
High-Throughput Nucleotide Sequencing
Biological Sciences
Neoplasm Proteins
Ovarian Cancer
Computational Theory and Mathematics
030220 oncology & carcinogenesis
Modeling and Simulation
Algorithms
Research Article
Signal Transduction
Biotechnology
Gene isoform
Bioinformatics
QH301-705.5
Computer Science - Artificial Intelligence
Molecular Sequence Data
Biology
03 medical and health sciences
Cellular and Molecular Neuroscience
Rare Diseases
Interaction network
Information and Computing Sciences
Breast Cancer
medicine
Humans
Molecular Biology
Gene
Ecology, Evolution, Behavior and Systematics
030304 developmental biology
Base Sequence
Human Genome
medicine.disease
Stem Cell Research
Computer Science - Learning
Artificial Intelligence (cs.AI)
MRNA Sequencing
RNA
Neoplasm
Software
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- PLoS Computational Biology, Vol 11, Iss 12, p e1004465 (2015), PLoS computational biology, vol 11, iss 12, PLoS Computational Biology
- Accession number :
- edsair.doi.dedup.....4af1e74cb803fdcf46de1a9c2620206a