Back to Search Start Over

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Authors :
Elias Bou Samra
Qiang Bai
Nicolas Philippe
Eric Rivals
John De Vos
Anthony Boureux
Florence Ruffle
Alban Mancheron
Thérèse Commes
Cellules souches normales et cancéreuses
Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)
Institut de Biologie Computationnelle (IBC)
Institut National de la Recherche Agronomique (INRA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Université Montpellier 2 - Sciences et Techniques (UM2)
Méthodes et Algorithmes pour la Bioinformatique (MAB)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Ligue Regionale contre le Cancer Languedoc-Roussillon, GEFLUC-Montpellier, Canceropole Grand Sud Ouest (GSO), Groupe Ouest Est d’Etudes des Leucémies et Autres Maladies du Sang (GOELAMS), CS Université Montpellier 2 (2011), CNRS INS2I [PEPS BFC: 66293]
Institute of Computational Biology, Investissement d’Avenir. Association de Recherche contre le Cancer [PDF20101202345 to N.P.]. Funding for open access charge: CNRS.
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Mancheron, Alban
Source :
Nucleic Acids Research, Nucleic Acids Research, Oxford University Press, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩, Nucleic Acids Research, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩
Publication Year :
2014
Publisher :
HAL CCSD, 2014.

Abstract

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.

Details

Language :
English
ISSN :
03051048 and 13624962
Database :
OpenAIRE
Journal :
Nucleic Acids Research, Nucleic Acids Research, Oxford University Press, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩, Nucleic Acids Research, 2014, 42 (5), pp.2820-2832. ⟨10.1093/nar/gkt1300⟩
Accession number :
edsair.doi.dedup.....42e83f93d739d03f1a3366ea37073295
Full Text :
https://doi.org/10.1093/nar/gkt1300⟩