【发布时间】:2021-05-24 18:34:07
【问题描述】:
如果发现基于字符串相似性的重复键值对,我会尝试从字典中删除整个键值对。 示例:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}
在上面的代码中,1和2是相似的字符串,所以必须删除其中一个,以下必须是保持ID不变的输出:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}
请帮我解决这个问题。
【问题讨论】:
-
你如何判断相似度?
-
相似度基于jaccard相似度。
标签: python dictionary nlp similarity