如何使用多处理包在 python 中并行化 for 循环？答案

【问题标题】：How can I parallelize a for loop in python using multiprocessing package?如何使用多处理包在 python 中并行化 for 循环？
【发布时间】：2019-04-15 20:04:29
【问题描述】：

注意：我不需要进程/线程之间的任何通信，我只对完成信号感兴趣（这就是我将这个问题作为一个新问题发布的原因，因为所有其他示例我'已经发现彼此之间进行了通信）。

如何在 Python 3 中使用 multiprocessing 包来并行化以下代码（最终目标是让它运行得更快）：

a = 123
b = 456
for id in ids: # len(ids) = 10'000
   # executes a binary with CLI flags
   run_binary_with_id(id, a, b) 
   # i.e. runs "./hello_world_exec --id id --a a --b b" which takes about 30 seconds on average

我尝试了以下方法：

import multiprocessing as mp

def run_binary_with_id(id, a, b):
    run_command('./hello_world_exec --id {} --a {} --b {}'.format(id, a, b))

if __name__ == '__main__':
    ctx = mp.get_context('spawn')
    q = ctx.Queue()
    a = 123
    b = 456
    ids = range(10000)
    for id in ids:
       p = ctx.Process(target=run_binary_with_id, args=(id,a,b))
       p.start()
    p.join()
    # The binary was executed len(ids) number of times, do other stuff assuming everything's completed at this point

或

for id in ids:
   map.apply_async(run_binary_with_id, (id,a,b))

在similar question 中，答案如下：

def consume(iterator):
    deque(iterator, max_len=0)
x=pool.imap_unordered(f,((i,j) for i in range(10000) for j in range(10000)))
consume(x)

我完全不明白（为什么我需要这个consume()）。

【问题讨论】：

标签： python-3.x python-multiprocessing python-multithreading

【解决方案1】：

尝试生成 10000 个并行运行的进程几乎肯定会使您的系统超载，并使其运行速度比按顺序运行进程慢，因为操作系统必须不断地在进程之间执行上下文切换时所涉及的开销。进程远远超过您系统的 CPU/内核数。

您可以改为使用multiprocessing.Pool 来限制为任务生成的工作进程的数量。 Pool 构造函数将进程数限制为系统默认拥有的内核数，但如果您愿意，可以使用 processes 参数对其进行微调。然后，您可以使用其map 方法轻松映射一系列参数以应用于给定函数以并行运行。但是，它只能将一个参数映射到函数，因此您必须使用 functools.partial 为其他参数提供默认值，在您的情况下，在调用之间不会更改：

from functools import partial
if __name__ == '__main__':
    _run_binary_with_id = partial(run_binary_with_id, a=123, b=456)
    with mp.Pool() as pool:
        pool.map(_run_binary_with_id, range(10000))

【讨论】：

这是有道理的，不过我有一个问题：我可以在您的代码 (1) 中将 range(10000) 明确替换为 ids 吗？我在哪里将 id 传递给 run_binary_with_id(id, a, b) ( 2)?
哦，我明白了，id 是自动传递的，例如看下面这段代码：pastebin.com/M2ZEfKsP，有意义吗？
是的。序列中的值作为第一个参数传递给函数，因此您不必指定参数的名称。