The Knuth-Morris-Pratt algorithm is a string matching algorithm that searches for occurrences of a pattern string P
within another string S
. It does so by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin.
KMP uses a pre-generated table called a prefix table. The prefix table allows us to skip certain comparisons.
The values in the prefix table are the length of the longest proper prefix that maches a proper suffix.
A C A C A G T
0 0 1 2 3 0 0
A
AC
ACA
ACAC
A
CA
ACA
CACA
Characters in a string excluding one or more characters at the end.
string = 'AYYYYY'
prefixes = ['A', 'AY', 'AYY', 'AYYY', 'AYYYY']
Characters in a string excluding one or more characters at the beginning.
string = 'LMAOO'
suffixes = ['MAOO', 'AOO', 'OO', 'O']