Question: Define the knuth-moris-pratt algorithm for string matching. Write a function to implement the consept of the same alghorithm.
0

Subject: Analysis Of Algorithm

Topic: String Matching Algorithm

Difficulty: Medium

aoa(14) • 989 views
 modified 19 months ago  • written 22 months ago by
0

Knuth–Morris–Pratt string searching algorithm The algorithm was conceived in 1970 by Donald Knuth and Vaughan Pratt, and independently by James H. Morris. Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

     This algorithm is best known for linear time for exact matching. It based on the comparision from left to right. and at a same time it shifts more than one position. It is having preprocessing approach of Pattern to avoid trivial comparisions and avoids recomputing matches.

1. Knuth, Morris and Pratt discovered first linear time string-matchingv algorithm by analysis of the naive algorithm.
2. It keeps the information that naive approach wasted gathered during the scan of the text. By avoiding this waste of information, it achieves a running time of O(m + n).
3. The implementation of Knuth-Morris-Pratt algorithm is efficient because it minimizes the total number of comparisons of the pattern against the input string. Knuth-Morris-Pratt algorithm uses prefix-function (┌┐): It preprocesses the pattern to find matches of prefixes of the pattern with the pattern itself. It is defined as the size of the largest prefix of P[0..j − 1] that is also a suffix of P[1..j]. It also indicates how much of the last comparison can be reused if it fails. The prefix-function enables avoiding backtracking on the string ‘S’.

Step 1: Initializes the input variables:

         n = Length of the Text .
m = Length of the P att e r n .
u = prefix-function of pattern ( p ) .
q = Number of charaqcters matched .


Step 2: Define the v a r i a b l e : q=0 , the beginning of the match . Step 3: Compare the f i r s t c h a r a ct e r of the p att e r n w it h f i r s t c h a r a ct e r of t e x t . I f match i s not found , s u b s t i t u t e the value of u[ q ] t o q . I f match i s found , then in c rement the value of q by 1. Step 4: Check whether a l l the p att e r n elements a re matched w it h the t e x t elements . I f not , repeat the sea rch p rocess . I f yes , p r i n t the number of s h i f t s taken by the p att e r n . Step 5: l o o k f o r the ne xt match .