【发布时间】:2018-01-28 18:14:38
【问题描述】:
我有一个非常简单的 python 代码,我想通过并行化它来加速它。然而,无论我做什么,multiprocessing.Pool.map 都没有比标准地图获得任何好处。
我读过其他线程,人们将它与非常小的函数一起使用,这些函数不能很好地并行化并导致过多的开销,但我认为这里不应该是这种情况。
我做错了吗?
这是一个例子
#!/usr/bin/python
import numpy, time
def AddNoise(sample):
#time.sleep(0.001)
return sample + numpy.random.randint(0,9,sample.shape)
#return sample + numpy.ones(sample.shape)
n=100
m=10000
start = time.time()
A = list([ numpy.random.randint(0,9,(n,n)) for i in range(m) ])
print("creating %d numpy arrays of %d x %d took %.2f seconds"%(m,n,n,time.time()-start))
for i in range(3):
start = time.time()
A = list(map(AddNoise, A))
print("adding numpy arrays took %.2f seconds"%(time.time()-start))
for i in range(3):
import multiprocessing
start = time.time()
with multiprocessing.Pool(processes=2) as pool:
A = list(pool.map(AddNoise, A, chunksize=100))
print("adding numpy arrays with multiprocessing Pool took %.2f seconds"%(time.time()-start))
for i in range(3):
import concurrent.futures
start = time.time()
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
A = list(executor.map(AddNoise, A))
print("adding numpy arrays with concurrent.futures.ProcessPoolExecutor took %.2f seconds"%(time.time()-start))
这会在我的 4 核/8 线程笔记本电脑上产生以下输出,否则会处于空闲状态
$ python test-pool.py
creating 10000 numpy arrays of 100 x 100 took 1.54 seconds
adding numpy arrays took 1.65 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays took 1.51 seconds
adding numpy arrays with multiprocessing Pool took 1.99 seconds
adding numpy arrays with multiprocessing Pool took 1.98 seconds
adding numpy arrays with multiprocessing Pool took 1.94 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.32 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.17 seconds
adding numpy arrays with concurrent.futures.ProcessPoolExecutor took 3.25 seconds
【问题讨论】:
-
如果它可以在 numpy 中完成,那么它可能会回避 GIL,这是使用
multiprocessing的主要原因。 -
但标准的
map调用也不会使用更多的 100% cpu。 -
由于 GIL,标准映射不能使用超过 1 个核心
-
@roganjosh 回答您之前的评论:在我的笔记本上,创建两个进程池的时间约为 4.5 毫秒。
-
@roganjosh,在第一次尝试使用
%timeit后,结果相同:D,我使用了几次%time。
标签: python python-3.x numpy multiprocessing pool