Back to Search
Start Over
Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
- Source :
- BMC Genomics, BMC Genomics, BioMed Central, 2019, 20 (1), ⟨10.1186/s12864-019-6131-1⟩, BMC Genomics, Vol 20, Iss 1, Pp 1-16 (2019), BMC Genomics 1 (20), . (2019)
- Publication Year :
- 2019
- Publisher :
- HAL CCSD, 2019.
-
Abstract
- Background More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced. Results The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFα, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected. Conclusions High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.
- Subjects :
- 0106 biological sciences
[SDV.OT]Life Sciences [q-bio]/Other [q-bio.OT]
lcsh:QH426-470
Inverted Repeat Sequences
repeats
lcsh:Biotechnology
Sequence assembly
gallus gallus
Computational biology
Biology
01 natural sciences
Genome
03 medical and health sciences
Tandem repeat
Illumina
lcsh:TP248.13-248.65
G-quadruplex/genome/Illumina/ PacBio/repeats
Genetics
Animals
Genomic library
Gene
genome
030304 developmental biology
adn complémentaire
PacBio
Base Composition
0303 health sciences
G-quadruplex
séquence génomique
Intron
High-Throughput Nucleotide Sequencing
DNA
Genomics
Sequence Analysis, DNA
Introns
lcsh:Genetics
séquençage du génome
Chickens
GC-content
Research Article
010606 plant biology & botany
Biotechnology
Autre (Sciences du Vivant)
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Database :
- OpenAIRE
- Journal :
- BMC Genomics, BMC Genomics, BioMed Central, 2019, 20 (1), ⟨10.1186/s12864-019-6131-1⟩, BMC Genomics, Vol 20, Iss 1, Pp 1-16 (2019), BMC Genomics 1 (20), . (2019)
- Accession number :
- edsair.doi.dedup.....31912fdde8ecaa1c148f0703b379eacf
- Full Text :
- https://doi.org/10.1186/s12864-019-6131-1⟩