【问题标题】:How to delete duplicate key value(string) pairs from a dictionary?如何从字典中删除重复的键值(字符串)对?
【发布时间】:2021-05-24 18:34:07
【问题描述】:

如果发现基于字符串相似性的重复键值对,我会尝试从字典中删除整个键值对。 示例:

d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}

在上面的代码中,1和2是相似的字符串,所以必须删除其中一个,以下必须是保持ID不变的输出:

 d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}

请帮我解决这个问题。

【问题讨论】:

  • 你如何判断相似度?
  • 相似度基于jaccard相似度。

标签: python dictionary nlp similarity


【解决方案1】:

如果“相似性”是指一个字符串包含在另一个字符串中,并且您想消除较短的字符串,则可以通过嵌套循环来实现,如下所示。请注意,您要复制字典,以免在迭代过程中更改原始字典。

d1={1:'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}

d2 = dict(d1) #make a copy of d1
for k, sent in d1.items():
    for sentence in d1.values():
        if sent in sentence and len(sent) != len(sentence):
            del d2[k]
            break
print(d2)
# {1: 'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad', 3: 'Don t skip leg day y all'}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-12-27
    • 1970-01-01
    • 1970-01-01
    • 2015-05-17
    • 1970-01-01
    • 1970-01-01
    • 2019-05-17
    相关资源
    最近更新 更多