Back to Search Start Over

Reconstructing sequences from shotgun data

Authors :
Paul Cull
Jim L. Holloway
Source :
Sequences II ISBN: 9781461393252
Publication Year :
1993
Publisher :
Springer New York, 1993.

Abstract

One method of sequencing DNA produces a linear list of bases, {A, C, G, T}, for many short overlapping fragments of the original DNA. To find the sequence of the original piece of DNA, the many fragments must be reassembled. While this problem of reassembly is similar to the NP-complete shortest common superstring problem [GMS80], we believe that biologists are actually trying to solve a simpler problem. Biologists assume that short overlaps between substrings are insignificant. Further, they assume that there is a unique string from which substrings could have been produced. We consider a reconstruction problem with these restrictions. We devise algorithms for this problem both when the overlaps must be exact and when there may be errors in the overlaps. Our algorithms are based on Rabin-Karp string matching, and on suffix arrays. We investigate the running times of our algorithms, and show that in expected case they have running times proportional to the length of the reconstructed sequence. We give the timings of some test runs and note that the suffix array algorithms seem to be faster.

Details

ISBN :
978-1-4613-9325-2
ISBNs :
9781461393252
Database :
OpenAIRE
Journal :
Sequences II ISBN: 9781461393252
Accession number :
edsair.doi...........2f593d68f251ea0d544e17d4940a84c7