Back to Search
Start Over
De novo construction of a 'Gene-space' for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources
- Source :
- BMC Research Notes 1 (9), 1-9. (2016), BMC Research Notes, BMC Research Notes, BioMed Central, 2016, 9 (1), pp.1-9. ⟨10.1186/s13104-016-1903-z⟩
- Publication Year :
- 2016
-
Abstract
- Background The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the “Gene-space” regions including the promoter, exon and intron sequences are considered. Results We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisumsativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array. Conclusion The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-1903-z) contains supplementary material, which is available to authorized users.
- Subjects :
- 0301 basic medicine
Genotyping Techniques
[SDV]Life Sciences [q-bio]
Assembly
Sequence assembly
UniGene
Genomics
Computational biology
Biology
Limited computing resources
Unigene
Genome
Polymorphism, Single Nucleotide
General Biochemistry, Genetics and Molecular Biology
Next-generation sequencing NGS
03 medical and health sciences
lterative process
Technical Note
[SDV.BV]Life Sciences [q-bio]/Vegetal Biology
Gene
Sequence (medicine)
Repetitive Sequences, Nucleic Acid
Genetics
Medicine(all)
gene-space
unigene
next-generation sequencing NGS
assembly
limited computing resources
Contig
Base Sequence
Biochemistry, Genetics and Molecular Biology(all)
Peas
Computational Biology
High-Throughput Nucleotide Sequencing
Reproducibility of Results
food and beverages
General Medicine
Diploidy
030104 developmental biology
Gene-space
[SDE]Environmental Sciences
Iterative process
Algorithms
Genome, Plant
SNP array
Subjects
Details
- Language :
- English
- ISSN :
- 17560500
- Database :
- OpenAIRE
- Journal :
- BMC Research Notes 1 (9), 1-9. (2016), BMC Research Notes, BMC Research Notes, BioMed Central, 2016, 9 (1), pp.1-9. ⟨10.1186/s13104-016-1903-z⟩
- Accession number :
- edsair.doi.dedup.....53051b10575806fdbf038dead5513d20