The naive approach to do string matching instead of KMP actually relatively simple, but the problem with the naïve approach is time-consuming. Naïve approach can be done using two loops to check each element or character of the pattern with the string itself. Let define the length of the string is M and the length of the pattern is N, then the time complexity of the naive approach is O(M*N) for the worst case.
KMP algorithm can do much better than the naive approach. KMP uses a generated table called failure function/failure table. A failure table is a table that holds an index when if the matching between elements in a given string with the pattern fails. When this happen, failure table will provide where or what index the matching should start over again.
To generate failure table, it is quite simple. We just need to track the match between suffix of the pattern. There is a simulator to understand this process much easier by accessing this link over here: https://people.ok.ubc.ca/ylucet/DS/KnuthMorrisPratt.html
But, if you are willing to calculate it manually, you can refer to this code and understand how the calculation works for the failure table.
j |
0 |
1 |
2 |
P[j] |
A |
A |
B |
F[j] |
0 |
1 |
0 |
P[] is an array of char of pattern.
F[] is a DP (dynamic programming) array to hold failure index.
j = iteration index.
The process to get that failure table can be described as below.
Okay, now we are going to check the match using KMP algorithm. I’ll use index i for text string iteration, and index j for failure table iteration. Blue table is the string, and Orange table is the failure table. Green cells mean match, and Red cells mean mismatch. Here’s the detail.
· Step
1: i = 0, j = 0 (match)
A |
A |
A |
A |
B |
A |
A |
B |
|
|
· Step
2: i = 1, j = 1 (match)
A |
A |
A |
A |
B |
A |
A |
B |
|
|
· Step
3: i = 2, j = 2 (mismatch).
A |
A |
A |
A |
B |
A |
A |
B |
|
|
In this state we will shift the matching position
which is F[j-1] = F[1] = 1. At this state i = 2, and j = 1. So, the table will
turn like this.
A |
A |
A |
A |
B |
|
A |
A |
B |
|
· Step
4: i = 3, j = 2 (mismatch).
A |
A |
A |
A |
B |
|
A |
A |
B |
|
In this state we will shift the matching position
which is F[j-1] = F[1] = 1. At this state i = 2, and j = 1. So, the table will
turn like this.
A |
A |
A |
A |
B |
|
|
A |
A |
B |
· Step
5: i = 4, j = 2 (match)
A |
A |
A |
A |
B |
|
|
A |
A |
B |
Therefore, we can see that pattern AAB exists in string AAAAB. By
far, we can learn that the time complexity for KMP algorithm is the sum of time
to generate the failure table which is let say M, plus the loop through the element
or character in the string let say N. So, the time complexity of KMP
algorithm is O(M+N), which is much much faster than the naive
method which is O(M*N).
No comments:
Post a Comment