Back to Search
Start Over
Efficiently detecting polymorphisms during the fragment assembly process
- Source :
- ISMB
- Publication Year :
- 2002
- Publisher :
- Oxford University Press (OUP), 2002.
-
Abstract
- Motivation: Current genomic sequence assemblers assume that the input data is derived from a single, homogeneous source. However, recent whole-genome shotgun sequencing projects have violated this assumption, resulting in input fragments covering the same region of the genome whose sequences differ due to polymorphic variation in the population. While single-nucleotide polymorphisms (SNPs) do not pose a significant problem to state-of-the-art assembly methods, these methods do not handle insertion/deletion (indel) polymorphisms of more than a few bases. Results: This paper describes an efficient method for detecting sequence discrepencies due to polymorphism that avoids resorting to global use of more costly, less stringent affine sequence alignments. Instead, the algorithm uses graph-based methods to determine the small set of fragments involved in each polymorphism and performs more sophisticated alignments only among fragments in that set. Results from the incorporation of this method into the Celera Assembler are reported for the D. melanogaster, H. sapiens, and M. musculus genomes. Availability: The method described herein does not constitute a stand-alone software application, but is laid out in sufficient detail to be implemented as a component of any genomic sequence assembler. Contact: daniel.fasulo@celera.com Keywords: whole-genome assembly; shotgun sequencing; polymorphism.
- Subjects :
- Statistics and Probability
Molecular Sequence Data
Population
Sequence assembly
DNA Fragmentation
Computational biology
Biology
Biochemistry
Genome
Set (abstract data type)
Consensus Sequence
Indel
education
Molecular Biology
Sequence (medicine)
Genetics
education.field_of_study
Polymorphism, Genetic
Base Sequence
Shotgun sequencing
Gene Expression Profiling
Genetic Variation
Sequence Analysis, DNA
Computer Science Applications
Computational Mathematics
Computational Theory and Mathematics
Graph (abstract data type)
Sequence Alignment
Algorithms
Polymorphism, Restriction Fragment Length
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 18
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....665945127dd158a7fc06beff46a698be
- Full Text :
- https://doi.org/10.1093/bioinformatics/18.suppl_1.s294