【发布时间】:2018-12-21 02:43:56
【问题描述】:
我想使用multiprocessing.Pool 加载一个大型数据集,这是我正在使用的代码:
import os
from os import listdir
import pickle
from os.path import join
import multiprocessing as mp
db_path = db_path
the_files = listdir(db_path)
fp_dict = {}
def loader(the_hash):
global fp_dict
the_file = join(db_path, the_hash)
with open(the_file, 'rb') as source:
fp_dict[the_hash] = pickle.load(source)
print(len(fp_dict))
def parallel(the_func, the_args):
global fp_dict
pool = mp.Pool(mp.cpu_count())
pool.map(the_func, the_args)
print(len(fp_dict))
parallel(loader, the_files)
有趣的是,fp_dict 的长度在代码运行时会发生变化。但是,只要进程终止,fp_dict 的长度就为零。为什么?如何使用multiprocessing.Pool修改全局变量?
【问题讨论】: