【发布时间】:2019-09-25 00:28:39
【问题描述】:
我正在比较 Python 中的两个数据框列,目的是为第一列的每个元素找到第二列的最佳匹配。第一列包含 19.000 行,我需要检查它的每个字符串,第二列的最佳匹配是什么。因此,需要检查 19.000 行,每行 19.000 次,考虑到字符串本身必须是另一个,而不是相同的。
我从一个简单的比较开始,在列表中找到一个字符串,我成功了。然后我将它应用于一个列表,只是为了比较它们,但显然,由于比较字符串与列表,会给出错误“TypeError:预期的字符串或类似字节的对象”。最后,我尝试创建一个循环,但错误是一样的。有没有办法创建一个具有预期结果的列表?也许有更好的方法来使用另一个库,但是,到目前为止,我什么也没找到。这是目前的代码:
#simple example
from fuzzywuzzy import process
string = "appl"
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(string,compare)
print(Ratios)
[('apple', 89), ('asple', 67), ('tab', 29), ('adfad.', 22)]
highest = process.extractOne(string,compare)
print(highest)
('apple', 89)
#data frame
from fuzzywuzzy import process
dataframecolumn = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
Ratios = process.extract(dataframecolumn,compare)
TypeError: expected string or bytes-like object
#expected (but I need a list)
highest = process.extractOne(dataframecolumn[0],compare)
print(highest)
('apple', 89)
highest = process.extractOne(dataframecolumn[1],compare)
print(highest)
('tab', 80)
#Result expected
results = ["apple, 89","tab, 80"]
#Error
myl = ["appl","tb"]
compare = ["adfad.","apple","asple","tab"]
results = []
for x in myl:
results.append(process.extractOne(myl,compare)[1])
TypeError: expected string or bytes-like object
【问题讨论】:
标签: python string matching similarity