Back to Search
Start Over
A second generation framework for the analysis of microsatellites in expressed sequence tags and the development of EST-SSR markers for a conifer, Cryptomeria japonica
- Source :
- BMC Genomics; Vol 13, BMC Genomics, Vol 13, Iss 1, p 136 (2012), BMC Genomics
- Publication Year :
- 2012
- Publisher :
- BioMed Central, 2012.
-
Abstract
- Background Microsatellites or simple sequence repeats (SSRs) in expressed sequence tags (ESTs) are useful resources for genome analysis because of their abundance, functionality and polymorphism. The advent of commercial second generation sequencing machines has lead to new strategies for developing EST-SSR markers, necessitating the development of bioinformatic framework that can keep pace with the increasing quality and quantity of sequence data produced. We describe an open scheme for analyzing ESTs and developing EST-SSR markers from reads collected by Sanger sequencing and pyrosequencing of sugi (Cryptomeria japonica). Results We collected 141,097 sequence reads by Sanger sequencing and 1,333,444 by pyrosequencing. After trimming contaminant and low quality sequences, 118,319 Sanger and 1,201,150 pyrosequencing reads were passed to the MIRA assembler, generating 81,284 contigs that were analysed for SSRs. 4,059 SSRs were found in 3,694 (4.54%) contigs, giving an SSR frequency lower than that in seven other plant species with gene indices (5.4–21.9%). The average GC content of the SSR-containing contigs was 41.55%, compared to 40.23% for all contigs. Tri-SSRs were the most common SSRs; the most common motif was AT, which was found in 655 (46.3%) di-SSRs, followed by the AAG motif, found in 342 (25.9%) tri-SSRs. Most (72.8%) tri-SSRs were in coding regions, but 55.6% of the di-SSRs were in non-coding regions; the AT motif was most abundant in 3′ untranslated regions. Gene ontology (GO) annotations showed that six GO terms were significantly overrepresented within SSR-containing contigs. Forty–four EST-SSR markers were developed from 192 primer pairs using two pipelines: read2Marker and the newly-developed CMiB, which combines several open tools. Markers resulting from both pipelines showed no differences in PCR success rate and polymorphisms, but PCR success and polymorphism were significantly affected by the expected PCR product size and number of SSR repeats, respectively. EST-SSR markers exhibited less polymorphism than genomic SSRs. Conclusions We have created a new open pipeline for developing EST-SSR markers and applied it in a comprehensive analysis of EST-SSRs and EST-SSR markers in C. japonica. The results will be useful in genomic analyses of conifers and other non-model species.
- Subjects :
- Genetic Markers
0106 biological sciences
lcsh:QH426-470
Cryptomeria
lcsh:Biotechnology
Biology
Genes, Plant
Polymerase Chain Reaction
01 natural sciences
Genome
03 medical and health sciences
symbols.namesake
Genome Size
lcsh:TP248.13-248.65
Genetics
Genomic library
Nucleotide Motifs
3' Untranslated Regions
Gene Library
030304 developmental biology
Expressed Sequence Tags
2. Zero hunger
Sanger sequencing
Base Composition
0303 health sciences
Expressed sequence tag
Polymorphism, Genetic
Contig
Computational Biology
Molecular Sequence Annotation
Sequence Analysis, DNA
lcsh:Genetics
Genetic marker
Linear Models
symbols
Microsatellite
Pyrosequencing
5' Untranslated Regions
Research Article
Microsatellite Repeats
010606 plant biology & botany
Biotechnology
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Volume :
- 13
- Issue :
- 1
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....173cbd21258fd226ad626174c07bc6f2
- Full Text :
- https://doi.org/10.1186/1471-2164-13-136