根据值之一从元组列表中删除重复项答案

【问题标题】：Removing duplicates from a list of tuples based on one of values根据值之一从元组列表中删除重复项
【发布时间】：2016-01-09 11:39:39
【问题描述】：

我有一个格式为 (float,string) 的元组列表。如何从列表中删除具有相同浮点值的重复项？

列表按浮点数降序排列。我想保留订单。

[(0.10507038451969995,
  'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.078586381821416265,
  'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.072031446647399661, '- Emergency personnel help victims.'),
 (0.072031446647399661, 'Emergency personnel help victims.')]

看看最后两个。

【问题讨论】：

嗯.. 为什么不赞成。请告知是否已经在其他地方问过这个问题？

标签： python string list tuples

【解决方案1】：

只有当值不在seen 中时，您才能创建一组可见值并添加元组：

>>> lst
[(0.10507038451969995,
 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.078586381821416265,
 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.072031446647399661, '- Emergency personnel help victims.'),
 (0.072031446647399661, 'Emergency personnel help victims.')]

>>> seen = set()
>>> result = []
>>> for a, b in lst:
...    if not a in seen:
...        seen.add(a)
...        result.append((a, b))
>>> print result

[(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'), 
 (0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),  
 (0.07203144664739966, '- Emergency personnel help victims.')]

这是另一种使用推导式的方法：

>>> seen = set()
>>> [(a, b) for a, b in lst if not (a in seen or seen.add(a))]

【讨论】：

【解决方案2】：

您可以使用itertools.groupby，因为您已经对这些值进行了排序。这是数据：

>>> lot
[(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'), 
(0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'), 
(0.07203144664739966, '- Emergency personnel help victims.'), 
(0.07203144664739966, 'Emergency personnel help victims.')]

演示：

>>> import itertools
>>> [next(t) for _, t in itertools.groupby(lot, lambda x: x[0])]
[(0.10507038451969995,
  'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.07858638182141627,
  'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.07203144664739966, '- Emergency personnel help victims.')]

这将为您提供组合在一起的第一个值。

【讨论】：

你可以用itemgetter(0)代替lambda

【解决方案3】：

>>> L = [(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
...  (0.078586381821416265, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
...  (0.072031446647399661, '- Emergency personnel help victims.'),
...  (0.072031446647399661, 'Emergency personnel help victims.')]

>>> from collections import OrderedDict
>>> OrderedDict(L).items()
[(0.10507038451969995, 'Deadly stampede in Shanghai - Emergency personnel help victims.'),
 (0.07858638182141627, 'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
 (0.07203144664739966, 'Emergency personnel help victims.')]

【讨论】：