【发布时间】:2014-08-04 01:58:52
【问题描述】:
我在网上搜索了一个 levenshtein trie 的实现和 我发现了这个:Levenshtein Distance Challenge: Causes。 我试图添加一段代码来规范化单词。如果一句话 例如有 5 个字母('Apple'),我有这个词('Aple') 距离是 1,我接受它是一样的。例如,当我 有一个更长的词('情况')你可以犯更多的错误。 如果您在这个词中有两个错误,则原始代码将 计算最小距离为 2 并且不会接受它。所以我想要 使用对数。用对数表示“情况”之间的距离 和 'kirkumstances' 会小于 2 并且因为演员表 int 将是 1。这就是我想要做的。
public class LevenshteinTrie {
private int distance = -1;
private Trie trie = null;
public LevenshteinTrie(int distance, Set<String> words) {
this.distance = distance;
this.trie = new Trie();
for(String word : words) {
this.trie.insert(word);
}
}
public Set<String> discoverFriends(String word, boolean normalized) {
Set<String> results = new HashSet<String>();
int[] currentRow = new int[word.length() + 1];
List<Character> chars = new ArrayList<Character>(word.length());
for(int i = 0; i < word.length(); i++) {
chars.add(word.charAt(i));
currentRow[i] = i;
}
currentRow[word.length()] = word.length();
for(Character c : this.trie.getRoot().getChildren().keySet()) {
this.traverseTrie(this.trie.getRoot().getChildren().get(c), c, chars, currentRow, results, normalized);
}
return results;
}
private void traverseTrie(TrieNode node, char letter, List<Character> word, int[] previousRow, Set<String> results, boolean normalized) {
int size = previousRow.length;
int[] currentRow = new int[size];
currentRow[0] = previousRow[0] + 1;
int minimumElement = currentRow[0];
int insertCost = 0;
int deleteCost = 0;
int replaceCost = 0;
for(int i = 1; i < size; i++) {
insertCost = currentRow[i - 1] + 1;
deleteCost = previousRow[i] + 1;
if(word.get(i - 1) == letter) {
replaceCost = previousRow[i - 1];
} else {
replaceCost = previousRow[i - 1] + 1;
}
currentRow[i] = Math.min(Math.min(insertCost, deleteCost), replaceCost);
if(currentRow[i] < minimumElement) {
if(normalized) {
minimumElement = (int)(currentRow[i] / (Math.log10(word.size())));
} else {
minimumElement = currentRow[i];
}
}
}
int tempCurrentRow = currentRow[size - 1];
if(normalized) {
tempCurrentRow = (int)(currentRow[size - 1] / (Math.log10(word.size())));
}
System.out.println(tempCurrentRow);
if(tempCurrentRow <= this.distance && node.getWord() != null) {
results.add(node.getWord());
}
if(minimumElement <= this.distance) {
for(Character c : node.getChildren().keySet()) {
this.traverseTrie(node.getChildren().get(c), c, word, currentRow, results, normalized);
}
}
}
}
public class Trie {
private TrieNode root = null;;
public Trie() {
this.root = new TrieNode();
}
public void insert(String word) {
TrieNode current = this.root;
if (word.length() == 0) {
current.setWord(word);
}
for (int i = 0; i < word.length(); i++) {
char letter = word.charAt(i);
TrieNode child = current.getChild(letter);
if (child != null) {
current = child;
} else {
current.getChildren().put(letter, new TrieNode());
current = current.getChild(letter);
}
if (i == word.length() - 1) {
current.setWord(word);
}
}
}
}
public class TrieNode {
public static final int ALPHABET = 26;
private String word = null;
private Map<Character, TrieNode> children = null;
public TrieNode() {
this.word = null;
this.children = new HashMap<Character, TrieNode>(ALPHABET);
}
public TrieNode getChild(char letter) {
if(this.children != null) {
if(children.containsKey(letter)) {
return children.get(letter);
}
}
return null;
}
public String getWord() {
return word;
}
}
很遗憾,此代码无法正常工作。我将最大距离设置为 1。 当我现在运行程序并搜索“vdimir putin”时(我有“vladimir putin” 在我的特里)程序不会接受它作为朋友。当我打印出临时 计算出来的距离是这样的:
最大距离 = 1 时的 tempCurrentRows:
11
11
10
10
10
10
11
11
11
11
10
11
11
11
11
11
11
11
10
10
10
10
10
10
10
10
10
10
9
11
11
10
10
10
10
但是当我将最大距离设置为 2 时,临时距离会发生变化:
最大距离 = 2 时的 tempCurrentRows:
11
11
11
10
10
10
10
9
9
8
7
6
5
4
3
2
1
11
11
10
10
9
9
所以代码中一定有很大的错误。但我不知道在哪里以及为什么 以及我必须如何更改代码才能按照我的意愿工作。
【问题讨论】:
标签: java trie levenshtein-distance