Python 3：Pool 是否保持传递给 map 的原始数据顺序？答案

【问题标题】：Python 3: does Pool keep the original order of data passed to map?Python 3：Pool 是否保持传递给 map 的原始数据顺序？
【发布时间】：2017-05-07 12:53:24
【问题描述】：

我编写了一个小脚本来在 4 个线程之间分配工作负载并测试结果是否保持有序（相对于输入的顺序）：

from multiprocessing import Pool
import numpy as np
import time
import random


rows = 16
columns = 1000000

vals = np.arange(rows * columns, dtype=np.int32).reshape(rows, columns)

def worker(arr):
    time.sleep(random.random())        # let the process sleep a random
    for idx in np.ndindex(arr.shape):  # amount of time to ensure that
        arr[idx] += 1                  # the processes finish at different
                                       # time steps
    return arr

# create the threadpool
with Pool(4) as p:
    # schedule one map/worker for each row in the original data
    q = p.map(worker, [row for row in vals])

for idx, row in enumerate(q):
    print("[{:0>2}]: {: >8} - {: >8}".format(idx, row[0], row[-1]))

对我来说，这总是会导致：

[00]:        1 -  1000000
[01]:  1000001 -  2000000
[02]:  2000001 -  3000000
[03]:  3000001 -  4000000
[04]:  4000001 -  5000000
[05]:  5000001 -  6000000
[06]:  6000001 -  7000000
[07]:  7000001 -  8000000
[08]:  8000001 -  9000000
[09]:  9000001 - 10000000
[10]: 10000001 - 11000000
[11]: 11000001 - 12000000
[12]: 12000001 - 13000000
[13]: 13000001 - 14000000
[14]: 14000001 - 15000000
[15]: 15000001 - 16000000

问题：那么，Pool 在将每个map 函数的结果存储在q 中时，是否真的保持原始输入的顺序？

旁注：我之所以这么问，是因为我需要一种简单的方法来并行处理多个工作人员的工作。在某些情况下，排序无关紧要。但是，在某些情况下，必须以原始顺序返回结果（如 q），因为我使用了一个依赖于有序数据的附加 reduce 函数。

性能：在我的机器上，这个操作比在单个进程上的正常执行快了大约 4 倍（正如预期的那样，因为我有 4 个内核）。此外，所有 4 个内核在运行时均处于 100% 的使用率。

【问题讨论】：

标签： python multithreading python-3.x multiprocessing threadpool

【解决方案1】：

Pool.map 结果是有序的。如果您需要订购，很好；如果你不这样做，Pool.imap_unordered 可能是一个有用的优化。

请注意，虽然您从Pool.map 接收结果的顺序是固定的，但它们的计算顺序是任意的。

【讨论】：

为什么imap_unordered 是“优化”？它更快吗？
@ascenator：如果后面的输出首先准备好，您可以使用它们而无需等待前面的输出完成。
那starmap也订购了吗？
@Victor'Chris'Cabral：是的，Pool.starmap 结果已排序。
@users235... 我在哪里可以读到这个？在星图的多处理文档中无法看到这一点。谢谢

【解决方案2】：

文档将其称为"parallel equivalent of the map() built-in function"。由于map 保证保持秩序，multiprocessing.Pool.map 也保证。

【讨论】：

保持订单是否会花费合理的计算时间？
我会怀疑。 imap_unordered（由 @user2357112 报告）的优势在于，当结果在主进程中可用时，您可以立即开始处理结果。使用Pool.map，您需要等到所有 n-1 项都准备好结果后，才能开始处理第 n 项。