LPS array (failure function)

int[] computeLPS(String pat) {
  int n = pat.length();
  int[] lps = new int[n];
  int len = 0, i = 1;
  while (i < n) {
    if (pat.charAt(i) == pat.charAt(len)) {
      lps[i++] = ++len;
    } else if (len != 0) {
      len = lps[len - 1];
    } else {
      lps[i++] = 0;
    }
  }
  return lps;
}
Advertisement

Main KMP loop

int i = 0, j = 0;
while (i < text.length()) {
  if (text.charAt(i) == pat.charAt(j)) { i++; j++; }
  if (j == pat.length()) {
    matches.add(i - j);
    j = lps[j - 1];
  } else if (i < text.length() && text.charAt(i) != pat.charAt(j)) {
    if (j != 0) j = lps[j - 1];
    else i++;
  }
}
Advertisement

Why O(N + M)

Each text character examined at most twice. LPS lets us skip forward instead of restart.

LPS array (failure function)

int[] computeLPS(String pat) {
  int n = pat.length();
  int[] lps = new int[n];
  int len = 0, i = 1;
  while (i < n) {
    if (pat.charAt(i) == pat.charAt(len)) {
      lps[i++] = ++len;
    } else if (len != 0) {
      len = lps[len - 1];
    } else {
      lps[i++] = 0;
    }
  }
  return lps;
}

Main KMP loop

int i = 0, j = 0;
while (i < text.length()) {
  if (text.charAt(i) == pat.charAt(j)) { i++; j++; }
  if (j == pat.length()) {
    matches.add(i - j);
    j = lps[j - 1];
  } else if (i < text.length() && text.charAt(i) != pat.charAt(j)) {
    if (j != 0) j = lps[j - 1];
    else i++;
  }
}

Why O(N + M)

Each text character examined at most twice. LPS lets us skip forward instead of restart.

Alternatives

Boyer-Moore: skip more chars, better for large alphabets. Rabin-Karp: hashing for multi-pattern.