Back to Search Start Over

Approximate matching of regular expressions

Approximate matching of regular expressions

Authors :
Myers, Eugene
Miller, Webb
Source :
Bulletin of Mathematical Biology; January 1989, Vol. 51 Issue: 1 p5-37, 33p
Publication Year :
1989

Abstract

Abstract: Given a sequenceA and regular expressionR, theapproximate regular expression matching problem is to find a sequence matchingR whose optimal alignment withA is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in timeO(MN), whereM andN are the lengths ofA andR. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires onlyO(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for substrings ofA that strongly align with a sequence inR, as required for typical data base searches. We also show how to deliver an optimal alignment betweenA andR in onlyO(N+logM) space usingO(MN logM) time. Finally, anO(MN(M+N)+N <superscript>2</superscript>logN) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.

Details

Language :
English
ISSN :
00928240 and 15229602
Volume :
51
Issue :
1
Database :
Supplemental Index
Journal :
Bulletin of Mathematical Biology
Publication Type :
Periodical
Accession number :
ejs15359253
Full Text :
https://doi.org/10.1007/BF02458834