Back to Search
Start Over
Improved annotation of the domestic pig genome through integration of Iso-Seq and RNA-seq data
- Source :
- BMC Genomics, Vol 20, Iss 1, Pp 1-19 (2019), BMC Genomics
- Publication Year :
- 2019
- Publisher :
- BMC, 2019.
-
Abstract
- Background Our understanding of the pig transcriptome is limited. RNA transcript diversity among nine tissues was assessed using poly(A) selected single-molecule long-read isoform sequencing (Iso-seq) and Illumina RNA sequencing (RNA-seq) from a single White cross-bred pig. Results Across tissues, a total of 67,746 unique transcripts were observed, including 60.5% predicted protein-coding, 36.2% long non-coding RNA and 3.3% nonsense-mediated decay transcripts. On average, 90% of the splice junctions were supported by RNA-seq within tissue. A large proportion (80%) represented novel transcripts, mostly produced by known protein-coding genes (70%), while 17% corresponded to novel genes. On average, four transcripts per known gene (tpg) were identified; an increase over current EBI (1.9 tpg) and NCBI (2.9 tpg) annotations and closer to the number reported in human genome (4.2 tpg). Our new pig genome annotation extended more than 6000 known gene borders (5′ end extension, 3′ end extension, or both) compared to EBI or NCBI annotations. We validated a large proportion of these extensions by independent pig poly(A) selected 3′-RNA-seq data, or human FANTOM5 Cap Analysis of Gene Expression data. Further, we detected 10,465 novel genes (81% non-coding) not reported in current pig genome annotations. More than 80% of these novel genes had transcripts detected in > 1 tissue. In addition, more than 80% of novel intergenic genes with at least one transcript detected in liver tissue had H3K4me3 or H3K36me3 peaks mapping to their promoter and gene body, respectively, in independent liver chromatin immunoprecipitation data. Conclusions These validated results show significant improvement over current pig genome annotations. Electronic supplementary material The online version of this article (10.1186/s12864-019-5709-y) contains supplementary material, which is available to authorized users.
- Subjects :
- 0106 biological sciences
Chromatin Immunoprecipitation
lcsh:QH426-470
Porcine
lcsh:Biotechnology
Sus scrofa
RNA-Seq
Computational biology
Biology
01 natural sciences
Genome
03 medical and health sciences
lcsh:TP248.13-248.65
Genetics
Animals
Transcriptome sequencing
Gene
030304 developmental biology
PacBio
0303 health sciences
Iso-seq
RNA
Computational Biology
High-Throughput Nucleotide Sequencing
Molecular Sequence Annotation
Genome project
Single molecule long read sequencing
Cap analysis gene expression
Alternative Splicing
lcsh:Genetics
Human genome
DNA microarray
RNA-seq
010606 plant biology & botany
Biotechnology
Research Article
Genome annotation
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Volume :
- 20
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....efd441fd1259ca92c952403bf67eeb75