如何减少python中的元组列表答案

【问题标题】：How to reduce on a list of tuples in python如何减少python中的元组列表
【发布时间】：2017-12-13 02:39:58
【问题描述】：

我有一个数组，我想计算数组中每个项目的出现次数。

我已经设法使用 map 函数来生成元组列表。

def mapper(a):
    return (a, 1)

r = list(map(lambda a: mapper(a), arr));

//output example: 
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)

我希望 reduce 函数可以帮助我按每个元组中的第一个数字 (id) 对计数进行分组。例如：

(11817685, 2), (2014036792, 1), (2014047115, 1)

我试过了

cnt = reduce(lambda a, b: a + b, r);

还有其他一些方法，但它们都不起作用。

注意感谢所有关于解决问题的其他方法的建议，但我只是在学习 Python 以及如何在这里实现 map-reduce，并且我已经简化了我的实际业务问题以使其易于理解，所以请告诉我做map-reduce的正确方法。

【问题讨论】：

lambda a: mapper(a)?为什么不直接通过mapper？另外：您的预期输出是什么？
感谢您的评论。是的，我可以直接传入映射器，正在试验其他东西。已添加我的预期输出。
你需要r还是只是一个中介？
只是中间人。
reduce 和 map 都没有真正帮助你。这类任务是存在collections.Counter 的原因（对于输入已经排序的更特殊情况，itertools.groupby）。 Map/Reduce 策略适用于您有许多 mapper 并行馈送许多 reducer 的情况；盲目地将相同的模式应用于纯单线程代码是一种浪费（在 Map/Reduce 情况下也是一种浪费，您只能依靠荒谬的并行度来弥补开销）。

标签： python python-2.7 mapreduce

【解决方案1】：

你可以使用Counter：

from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())

编辑：

正如@ShadowRanger 所指出的，Counter 具有items() 方法：

from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()

【讨论】：

为什么是zip keys 和values？有一个items 方法可以直接执行此操作：print counter.items()，还有一个专用方法most_common，它按频率顺序显示结果（对结果数量有一个可选限制），例如print counter.most_common().

【解决方案2】：

您可以使用一些逻辑并在没有任何模块的情况下执行此操作，而不是使用任何外部模块：

track={}
if intr not in track:
    track[intr]=1
else:
    track[intr]+=1

示例代码：

对于这些类型的列表问题，有一个模式：

所以假设你有一个列表：

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

并且您想将其转换为 dict 作为元组的第一个元素作为键和元组的第二个元素。类似：

{2008: [9], 2006: [5], 2007: [4]}

但是有一个问题，您还希望那些具有不同值但键相同的键，例如 (2006,1) 和 (2006,5) 键相同但值不同。您希望这些值仅附加一个键，因此预期输出：

{2008: [9], 2006: [1, 5], 2007: [4]}

对于这种类型的问题，我们会这样做：

首先创建一个新的字典然后我们遵循这个模式：

if item[0] not in new_dict:
    new_dict[item[0]]=[item[1]]
else:
    new_dict[item[0]].append(item[1])

所以我们首先检查 key 是否在新的 dict 中，如果已经存在则将重复键的值添加到它的值中：

完整代码：

a=[(2006,1),(2007,4),(2008,9),(2006,5)]

new_dict={}

for item in a:
    if item[0] not in new_dict:
        new_dict[item[0]]=[item[1]]
    else:
        new_dict[item[0]].append(item[1])

print(new_dict)

输出：

{2008: [9], 2006: [1, 5], 2007: [4]}

【讨论】：

【解决方案3】：

写了my answer到a different question之后，我想起了这篇文章，觉得在这里写一个类似的答案会有所帮助。

这是一种使用列表中的reduce 来获得所需输出的方法。

arr = [11817685, 2014036792, 2014047115, 11817685]

def mapper(a):
    return (a, 1)

def reducer(x, y):
    if isinstance(x, dict):
        ykey, yval = y
        if ykey not in x:
            x[ykey] = yval
        else:
            x[ykey] += yval
        return x
    else:
        xkey, xval = x
        ykey, yval = y
        a = {xkey: xval}
        if ykey in a:
            a[ykey] += yval
        else:
            a[ykey] = yval
        return a

mapred = reduce(reducer, map(mapper, arr))

print mapred.items()

哪些打印：

[(2014036792, 1), (2014047115, 1), (11817685, 2)]

请参阅linked answer 以获得更详细的说明。

【讨论】：

【解决方案4】：

如果您只需要cnt，那么dict 可能比tuples 的list 更好（如果您需要这种格式，只需使用dict.items）。

collections 模块为此提供了一个有用的数据结构，defaultdict。

from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
                       # the result of calling int
for key in arr:
  cnt[key] += 1 # if key is not in cnt, it will put in the default

# cnt_list = list(cnt.items())

【讨论】：