Back to Search
Start Over
Reconstructing sequences from shotgun data
- Source :
- Sequences II ISBN: 9781461393252
- Publication Year :
- 1993
- Publisher :
- Springer New York, 1993.
-
Abstract
- One method of sequencing DNA produces a linear list of bases, {A, C, G, T}, for many short overlapping fragments of the original DNA. To find the sequence of the original piece of DNA, the many fragments must be reassembled. While this problem of reassembly is similar to the NP-complete shortest common superstring problem [GMS80], we believe that biologists are actually trying to solve a simpler problem. Biologists assume that short overlaps between substrings are insignificant. Further, they assume that there is a unique string from which substrings could have been produced. We consider a reconstruction problem with these restrictions. We devise algorithms for this problem both when the overlaps must be exact and when there may be errors in the overlaps. Our algorithms are based on Rabin-Karp string matching, and on suffix arrays. We investigate the running times of our algorithms, and show that in expected case they have running times proportional to the length of the reconstructed sequence. We give the timings of some test runs and note that the suffix array algorithms seem to be faster.
Details
- ISBN :
- 978-1-4613-9325-2
- ISBNs :
- 9781461393252
- Database :
- OpenAIRE
- Journal :
- Sequences II ISBN: 9781461393252
- Accession number :
- edsair.doi...........2f593d68f251ea0d544e17d4940a84c7