【问题标题】:Weird behaviour with multiprocessing Pool.map多处理 Pool.map 的奇怪行为
【发布时间】:2019-06-13 13:21:48
【问题描述】:

当使用pool.map 调用方法函数时,我观察到一个非常奇怪的行为。 只有一个进程的行为与简单的 for 循环不同,我们在 if not self.seeded: 块中输入了多次,而我们不应该这样做。 这是下面的代码和输出:

import os
from multiprocessing import Pool


class MyClass(object):
    def __init__(self):
        self.seeded = False
        print("Constructor of MyClass called")

    def f(self, i):
        print("f called with", i)
        if not self.seeded:
            print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
            self.seeded = True

    def multi_call_pool_map(self):
        with Pool(processes=1) as pool:
            print("multi_call_pool_map with {} processes...".format(pool._processes))
            pool.map(self.f, range(10))

    def multi_call_for_loop(self):
        print("multi_call_for_loop ...")
        list_res = []
        for i in range(10):
            list_res.append(self.f(i))


if __name__ == "__main__":
    MyClass().multi_call_pool_map()

输出:

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 4
f called with 5
f called with 6
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False
f called with 7
f called with 8
f called with 9
PID : 18248, id(self.seeded) : 1864747472, self.seeded : False

还有 for 循环:

if __name__ == "__main__":
    MyClass().multi_call_for_loop()

输出:

Constructor of MyClass called
multi_call_for_loop ...
f called with 0
PID : 15840, id(self.seeded) : 1864747472, self.seeded : False
f called with 1
f called with 2
f called with 3
f called with 4
f called with 5
f called with 6
f called with 7
f called with 8
f called with 9

我们如何解释 pool.map 的行为(第一种情况)?我不明白为什么我们在 if 块中多次输入,因为 self.seeded 仅在构造函数中设置为 False 并且构造函数仅被调用一次...... (我有 Python 3.6.8)

【问题讨论】:

  • 这是因为 Pool 将您的输入可迭代分块的方式。您的设置的块大小将是 3,在此处产生 [3,3,3,1] 块。您可以在我的答案here 中使用calc_chunksize() 计算它。

标签: python multiprocessing pool


【解决方案1】:

当运行代码并在f 中打印self 时,我们可以看到在每次输入if 子句之前,实例实际上发生了变化:

    def f(self, i):
        print("f called with", i, "self is",self)
        if not self.seeded:
            print("PID : {}, id(self.seeded) : {}, self.seeded : {}".format(os.getpid(), id(self.seeded), self.seeded))
            self.seeded = True

这个输出:

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7f30cd592b38>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 2 self is <__main__.MyClass object at 0x7f30cd592b38>
f called with 3 self is <__main__.MyClass object at 0x7f30cd592b00>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 4 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 5 self is <__main__.MyClass object at 0x7f30cd592b00>
f called with 6 self is <__main__.MyClass object at 0x7f30cd592ac8>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False
f called with 7 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 8 self is <__main__.MyClass object at 0x7f30cd592ac8>
f called with 9 self is <__main__.MyClass object at 0x7f30cd592a90>
PID : 22879, id(self.seeded) : 10744096, self.seeded : False

如果你将chunksize=10 添加到.map(),它的行为就像for循环一样:

    def multi_call_pool_map(self):
        with Pool(processes=1) as pool:
            print("multi_call_pool_map with {} processes...".format(pool._processes))
            pool.map(self.f, range(10), chunksize=10)

这个输出:

Constructor of MyClass called
multi_call_pool_map with 1 processes...
f called with 0 self is <__main__.MyClass object at 0x7fd175093b00>
PID : 22972, id(self.seeded) : 10744096, self.seeded : False
f called with 1 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 2 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 3 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 4 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 5 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 6 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 7 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 8 self is <__main__.MyClass object at 0x7fd175093b00>
f called with 9 self is <__main__.MyClass object at 0x7fd175093b00>

发生这种情况的确切原因是一个非常复杂的实现细节,并且与multiprocessing 如何在同一池中的进程之间共享数据有关。

恐怕我没有足够的资格来确切地回答这在内部是如何以及为什么起作用的。

【讨论】:

  • 感谢您的快速回复。尽管有多个实例,但构造函数只被调用一次这一事实有点令人困惑。为什么 id(self.seed) 对于所有不同的实例都是一样的?
  • 我试图在网上找到一篇关于这个的简单文章,但我发现的只是关于用法而不是内部细节,我担心
  • 这取决于你想做什么,但使用from multiprocessing.dummy import Pool(不一样!谷歌多处理与多线程)你会立即得到所需的行为。
【解决方案2】:

当您使用Pool.map 的实例方法时,对象实例的副本将在pickle 模块的帮助下发送到工作进程。您的结果显示了 map 如何在块中工作,并且对象实例在每个块开始时从腌制形式重新加载。加载泡菜不会调用__init__

请参阅https://thelaziestprogrammer.com/python/a-multiprocessing-pool-pickle 了解更多关于幕后发生的事情。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2014-10-09
    • 1970-01-01
    • 1970-01-01
    • 2019-01-30
    • 1970-01-01
    • 2015-06-30
    • 2019-12-11
    • 2011-06-16
    相关资源
    最近更新 更多