Back to Search Start Over

De novo construction of a 'Gene-space' for diploid plant genome rich in repetitive sequences by an iterative Process of Extraction and Assembly of NGS reads (iPEA protocol) with limited computing resources

Authors :
Judith Burstin
Christelle Aluome
Dominique Brunel
Susete Alves Carvalho
Marie-Christine Le Paslier
Grégoire Aubert
Biodiversité, Gènes & Communautés (BioGeCo)
Institut National de la Recherche Agronomique (INRA)-Université de Bordeaux (UB)
Agroécologie [Dijon]
Institut National de la Recherche Agronomique (INRA)-Université de Bourgogne (UB)-AgroSup Dijon - Institut National Supérieur des Sciences Agronomiques, de l'Alimentation et de l'Environnement
Etude du Polymorphisme des Génomes Végétaux (EPGV)
Institut National de la Recherche Agronomique (INRA)
Source :
BMC Research Notes 1 (9), 1-9. (2016), BMC Research Notes, BMC Research Notes, BioMed Central, 2016, 9 (1), pp.1-9. ⟨10.1186/s13104-016-1903-z⟩
Publication Year :
2016

Abstract

Background The continuing increase in size and quality of the “short reads” raw data is a significant help for the quality of the assembly obtained through various bioinformatics tools. However, building a reference genome sequence for most plant species remains a significant challenge due to the large number of repeated sequences which are problematic for a whole-genome quality de novo assembly. Furthermore, for most SNP identification approaches in plant genetics and breeding, only the “Gene-space” regions including the promoter, exon and intron sequences are considered. Results We developed the iPea protocol to produce a de novo Gene-space assembly by reconstructing, in an iterative way, the non-coding sequence flanking the Unigene cDNA sequence through addition of next-generation DNA-seq data. The approach was elaborated with the large diploid genome of pea (Pisumsativum L.), rich in repetitive sequences. The final Gene-space assembly included 35,400 contigs (97 Mb), covering 88 % of the 40,227 contigs (53.1 Mb) of the PsCam_low-copy Unigen set. Its accuracy was validated by the results of the built GenoPea 13.2 K SNP Array. Conclusion The iPEA protocol allows the reconstruction of a Gene-space based from RNA-Seq and DNA-seq data with limited computing resources. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-1903-z) contains supplementary material, which is available to authorized users.

Details

Language :
English
ISSN :
17560500
Database :
OpenAIRE
Journal :
BMC Research Notes 1 (9), 1-9. (2016), BMC Research Notes, BMC Research Notes, BioMed Central, 2016, 9 (1), pp.1-9. ⟨10.1186/s13104-016-1903-z⟩
Accession number :
edsair.doi.dedup.....53051b10575806fdbf038dead5513d20