Back to Search
Start Over
Inference of the Human Polyadenylation Code
- Source :
- Bioinformatics
- Publication Year :
- 2017
- Publisher :
- Cold Spring Harbor Laboratory, 2017.
-
Abstract
- Processing of transcripts at the 3’-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which polyadenylation site is cleaved, alternative polyadenylation enables genes to produce transcript isoforms with different 3’-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the underlying regulatory processes, a computational model that can accurately predict polyadenylation patterns based on genomic features is desirable. Previous works have focused on identifying candidate polyadenylation sites and classifying sites which may be tissue-specific. What is lacking is a predictive model of the underlying mechanism of site selection, competition, and processing efficiency in a tissue-specific manner. We develop a deep learning model that trains on 3’-end sequencing data and predicts tissue-specific site selection among competing polyadenylation sites in the 3’ untranslated region of the human genome.Two neural network architectures are evaluated: one built on hand-engineered features, and another that directly learns from the genomic sequence. The hand-engineered features include polyadenylation signals, cis-regulatory elements, n-mer counts, nucleosome occupancy, and RNA-binding protein motifs. The direct-from-sequence model is inferred without prior knowledge on polyadenylation, based on a convolutional neural network trained with genomic sequences surrounding each polyadenylation site as input. Both models are trained using the TensorFlow library.The proposed polyadenylation code can predict site selection among competing polyadenylation sites in different tissues. Importantly, it does so without relying on evolutionary conservation. The model can distinguish pathogenic from benign variants that appear near annotated polyadenylation sites in ClinVar and inspect the genome to find candidate polyadenylation sites. We also provide an analysis on how different features affect the model’s performance.
- Subjects :
- 0301 basic medicine
Statistics and Probability
Untranslated region
Polyadenylation
Genomics
Computational biology
Biology
Biochemistry
Genome
Conserved sequence
03 medical and health sciences
0302 clinical medicine
Humans
Structural motif
3' Untranslated Regions
Molecular Biology
Gene
030304 developmental biology
Genetics
Regulation of gene expression
0303 health sciences
business.industry
Genome, Human
Three prime untranslated region
Deep learning
Genome Analysis
Original Papers
Computer Science Applications
Computational Mathematics
030104 developmental biology
Gene Expression Regulation
Computational Theory and Mathematics
Human genome
Artificial intelligence
business
Poly A
030217 neurology & neurosurgery
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....7377e01b488cd07a67c3b40764fd8dac
- Full Text :
- https://doi.org/10.1101/130591