【发布时间】:2023-04-04 22:19:01
【问题描述】:
我有以下几点:
输入 df -
fruit uniqueid
apple 1123
appless 321
banana 623
mango 739
mangos 889
代码-
df.loc[:,'fruit_copy'] = df['fruit']
## comparing values from one column to each other
compare = pd.MultiIndex.from_product([df['fruit'],df['fruit_copy']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare = compare.apply(metrics)
## only keep higher matches
compare_80 = compare[(compare['ratio'] >=80) & (compare['token'] >=80)]
当前输出 -
ratio token
apple apple 100 100
appless 83 83
appless apple 83 83
appless 100 100
banana banana 100 100
mango mango 100 100
mangos 91 91
mangos mango 91 91
mangos 100 100
预期结果第一个目标 -
index1 index2 ratio token uniqueid
apple 1123 apple 100 100 1123
appless 83 83 321
appless 321 apple 83 83 1123
appless 100 100 321
banana 623 banana 100 100 632
mango 739 mango 100 100 739
mangos 91 91 889
mangos 889 mango 91 91 739
mangos 100 100 889
预期结果第二个目标 -
index1 index2 ratio token uniqueid
apple 1123 appless 83 83 321
mango 739 mangos 91 91 889
我可以通过将 uniqueid 附加到多值索引来实现吗?
【问题讨论】:
-
感谢您的建议 - 添加输入 df
-
current output与实际输出不符,请重新运行 -
谢谢,我重新运行并更新了
标签: python python-3.x pandas multi-index