【发布时间】:2018-10-26 19:21:23
【问题描述】:
我正在尝试以 2 的替换成本实现最小编辑距离。以下是我到目前为止的代码。它适用于相等长度的字符串,但会为不相等的字符串生成错误。请纠正我哪里错了
def med(source, target):
# if len(x) > len(y):
# print("insode if")
# source, target = y, x
print(len(source), len(target))
cost = [[0 for inner in range(len(source)+1)] for outer in
range(len(target)+1)]
global backtrace
backtrace = [[0 for inner in range(len(source)+1)] for outer in
range(len(target)+1)]
global SUB
global INS
global DEL
for i in range(0,len(target)+1):
cost[i][0] = i
for j in range(0,len(source)+1):
cost[0][j] = j
for i in range(1,len(target)+1):
for j in range(1,len(source)+1):
if source[i-1]==target[j-1]:
cost[i][j] = cost[i-1][j-1]
else:
deletion = cost[i-1][j]+1
insertion = cost[i][j-1]+1
substitution = cost[i-1][j-1]+2
cost[i][j] = min(insertion,deletion,substitution)
if cost[i][j] == substitution:
backtrace[i][j] = SUB
elif cost[i][j] == insertion:
backtrace[i][j] = INS
else:
backtrace[i][j] = DEL
return cost[i][j]
med("levenshtein","levels")
我得到的错误是:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-26-86bf20ea27c7> in <module>()
49 return cost[i][j]
50
---> 51 med("levenshtein","levels")
<ipython-input-26-86bf20ea27c7> in med(source, target)
31 for i in range(1,len(target)+1):
32 for j in range(1,len(source)+1):
---> 33 if source[i-1]==target[j-1]:
34 cost[i][j] = cost[i-1][j-1]
35 else:
IndexError: string index out of range
【问题讨论】:
标签: python nlp edit-distance