我向您提出以下(一般)解决方案:
- 压缩每个单词,以免有任何重复的字母
- 获取要匹配的单词字典
- 匹配字典中具有最小 Levenshtein 距离的单词
压缩应该产生这个:
heelp -> help
help -> help
heeeelp -> help
hhhheeeelllllpppp -> help
heeeklp -> heklp
hlep -> hlep
helper -> helper
两个单词之间的 Levenshtein 距离 (LD(word1, word2)) 是要更改以使它们相等的字符数。示例:
hhhheeeelllllpppp -> help -> LD(help, help) = 0, LD(help, helper) = 2 <- help match
heeeklp -> heklp -> LD(heklp, help) = 1, LD(heklp, helper) = 3 <- help match
hlep -> hlep -> LD(hlep, help) = 2, LD(hlep, helper) = 3 <- help match
helper -> helper -> LD(helper, help) = 2, LD(helper, helper) = 0 <- helper match
这是我的解决方案:
import java.util.*;
public class LevenshteinDistance {
private static int minimum(int a, int b, int c) {
return Math.min(Math.min(a, b), c);
}
public static int computeLevenshteinDistance(CharSequence lhs, CharSequence rhs) {
int[][] distance = new int[lhs.length() + 1][rhs.length() + 1];
for (int i = 0; i <= lhs.length(); i++)
distance[i][0] = i;
for (int j = 1; j <= rhs.length(); j++)
distance[0][j] = j;
for (int i = 1; i <= lhs.length(); i++)
for (int j = 1; j <= rhs.length(); j++)
distance[i][j] = minimum(
distance[i - 1][j] + 1,
distance[i][j - 1] + 1,
distance[i - 1][j - 1] + ((lhs.charAt(i - 1) == rhs.charAt(j - 1)) ? 0 : 1));
return distance[lhs.length()][rhs.length()];
}
public static String compress(String s) {
char[] chars = s.toCharArray();
Character last_char = null;
StringBuilder sb = new StringBuilder();
for (Character c:chars) {
if(c != last_char) {
sb.append(c);
last_char = c;
}
}
return sb.toString();
}
public static void main(String[] argv) {
String[] strings = {"heelp", "help", "heeeelp", "hhhheeeelllllpppp", "heeeklp", "hlep", "helper"};
String[] dict = {"help", "helper"};
String match = "", c;
int min_distance, distance;
for(String s : strings) {
c = compress(s);
min_distance = computeLevenshteinDistance(c, "");
for(String d : dict) {
distance = computeLevenshteinDistance(c, d);
System.out.println("compressed: "+c+ " dict: "+d+" distance: "+Integer.toString(distance));
if(distance < min_distance) {
match = d;
min_distance = distance;
}
}
System.out.println(s + " matches " + match);
}
}
}
这是输出:
compressed: help dict: help distance: 0
compressed: help dict: helper distance: 2
heelp matches help
compressed: help dict: help distance: 0
compressed: help dict: helper distance: 2
help matches help
compressed: help dict: help distance: 0
compressed: help dict: helper distance: 2
heeeelp matches help
compressed: help dict: help distance: 0
compressed: help dict: helper distance: 2
hhhheeeelllllpppp matches help
compressed: heklp dict: help distance: 1
compressed: heklp dict: helper distance: 3
heeeklp matches help
compressed: hlep dict: help distance: 2
compressed: hlep dict: helper distance: 3
hlep matches help
compressed: helper dict: help distance: 2
compressed: helper dict: helper distance: 0
helper matches helper