Back to Search
Start Over
Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome
- Source :
- Nucleic Acids Research, Nucleic Acids Research, Oxford University Press, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩, Nucleic Acids Research, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩
- Publication Year :
- 2014
- Publisher :
- HAL CCSD, 2014.
-
Abstract
- Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.
- Subjects :
- RNA, Untranslated
Transcription, Genetic
Computational biology
Biology
Genome
Cell Line
03 medical and health sciences
0302 clinical medicine
Genetics
Humans
Ensembl
Gene
[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]
030304 developmental biology
0303 health sciences
Tiling array
Genome, Human
Sequence Analysis, RNA
Gene Expression Profiling
Intron
Computational Biology
Molecular Sequence Annotation
Gene expression profiling
Human genome
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Poly A
Software
030217 neurology & neurosurgery
Reference genome
Subjects
Details
- Language :
- English
- ISSN :
- 03051048 and 13624962
- Database :
- OpenAIRE
- Journal :
- Nucleic Acids Research, Nucleic Acids Research, Oxford University Press, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩, Nucleic Acids Research, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩
- Accession number :
- edsair.doi.dedup.....42e83f93d739d03f1a3366ea37073295
- Full Text :
- https://doi.org/10.1093/nar/gkt1300⟩