【问题标题】:Removing duplicates from a list of tuples based on one of values根据值之一从元组列表中删除重复项
【发布时间】:2016-01-09 11:39:39
【问题描述】:

我有一个格式为 (float,string) 的元组列表。如何从列表中删除具有相同浮点值的重复项?

列表按浮点数降序排列。我想保留订单。

[(0.10507038451969995,
  'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.078586381821416265,
  'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.072031446647399661, '- Emergency personnel help victims.'),
 (0.072031446647399661, 'Emergency personnel help victims.')]

看看最后两个。

【问题讨论】:

  • 嗯.. 为什么不赞成。请告知是否已经在其他地方问过这个问题?

标签: python string list tuples


【解决方案1】:

只有当值不在seen 中时,您才能创建一组可见值并添加元组:

>>> lst
[(0.10507038451969995,
 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.078586381821416265,
 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.072031446647399661, '- Emergency personnel help victims.'),
 (0.072031446647399661, 'Emergency personnel help victims.')]

>>> seen = set()
>>> result = []
>>> for a, b in lst:
...    if not a in seen:
...        seen.add(a)
...        result.append((a, b))
>>> print result

[(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'), 
 (0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),  
 (0.07203144664739966, '- Emergency personnel help victims.')]

这是另一种使用推导式的方法:

>>> seen = set()
>>> [(a, b) for a, b in lst if not (a in seen or seen.add(a))]

【讨论】:

    【解决方案2】:

    您可以使用itertools.groupby,因为您已经对这些值进行了排序。这是数据:

    >>> lot
    [(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'), 
    (0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'), 
    (0.07203144664739966, '- Emergency personnel help victims.'), 
    (0.07203144664739966, 'Emergency personnel help victims.')]
    

    演示:

    >>> import itertools
    >>> [next(t) for _, t in itertools.groupby(lot, lambda x: x[0])]
    [(0.10507038451969995,
      'Deadly stampede in Shanghai - Emergency personnel help victims.'),
     (0.07858638182141627,
      'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
     (0.07203144664739966, '- Emergency personnel help victims.')]
    

    这将为您提供组合在一起的第一个值。

    【讨论】:

    • 你可以用itemgetter(0)代替lambda
    【解决方案3】:
    >>> L = [(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
    ...  (0.078586381821416265, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
    ...  (0.072031446647399661, '- Emergency personnel help victims.'),
    ...  (0.072031446647399661, 'Emergency personnel help victims.')]
    
    >>> from collections import OrderedDict
    >>> OrderedDict(L).items()
    [(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
     (0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
     (0.07203144664739966, 'Emergency personnel help victims.')]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-02-12
      • 1970-01-01
      • 2019-06-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-03-13
      相关资源
      最近更新 更多