RuntimeError：python 多处理错误答案

【问题标题】：RuntimeError: python multiprocessing errorRuntimeError：python 多处理错误
【发布时间】：2021-12-07 20:21:43
【问题描述】：

我正在尝试使用以下代码在 python 中使用并行处理：

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

# run the function 'implement' in parallel for different values of the input variable 'fname'
pool = mp.Pool(10)
results = [pool.apply(implement, args=(fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
pool.close()

但它会抛出以下错误：

    An attempt has been made to start a new process before the
    current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

其他人发布了同样错误的问题。但我无法实施那里发布的解决方案，因为不清楚如何为我的代码调整这些解决方案。

【问题讨论】：

您是否按照错误消息的指示添加了if __name__ == '__main__':？ multiprocessing documentation 解释了该行的必要性。
更具体地说，最后三行只需要在主线程中执行。按照您的方式，每个新启动的线程都会读取您的文件并启动另一个由十个线程组成的池。池创建代码只需要执行一次。
感谢您的建议。我找不到那里说明的mp.Pool.apply() 方法。但pool.map() 似乎工作正常。
@FrankYellin 我在此之前添加了if __name__ == '__main__':' after pool = mp.Pool. That is why it was not working. It works if I add if name == 'main':'线。但现在看来，它并没有像通常的 for 循环那样按顺序运行；它没有并行化。

标签： python windows multiprocessing

【解决方案1】：

在某些系统上，多处理必须生成一个新的 python 副本并导入您的模块以获取工作代码。模块级别的任何内容都会再次执行......包括创建池的父代码。这将是一个无限递归，除非 python 检测到问题并给你一个方便的提示。你会跟着它

import os
import datetime
import numpy as np
import FarimaModule
from  statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt
import multiprocessing as mp

# Here I define some variables: p_max,q_max,m_list,wlen,mstep,fs, listFile

def implement(fname,p_max,q_max,m_list,wlen,mstep,fs):
    # It is a really long code

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    pool = mp.Pool(10)
    results = [pool.apply(implement, args= 
       (fname,p_max,q_max,m_list,wlen,mstep,fs)) for fname in listFile]
    pool.close()

顶级 Python 脚本的名称始终为 "__main__"。当由子流程导入时，它现在是一个模块并且具有不同的名称。

pool.apply 可能不是您想要的方法 - 它等待池工作人员完成。 map 可能是更好的选择。我分块（组）输入。在您的情况下，通过昂贵的计算，您可能需要一个小块。 starmap 只是 map 有多个参数。

if __name__ == "__main__":
    # run the function 'implement' in parallel for different values of the input variable 'fname'
    with mp.Pool(10) as pool:
        results = pool.starmap(implement, 
            [(fname,p_max,q_max,m_list,wlen,mstep,fs)) 
                for fname in listFile],
            chunksize=1)

【讨论】：

这段代码可以工作，但它没有做并行处理。它按顺序迭代“fname”。
那是apply。您可以apply_async 或使用map，就像我添加的示例一样。
pool.apply 使用一个线程。如果要使用多个线程，则必须对pool.apply 进行多次调用或使用pool.map() 的变体之一。
谢谢大家。 pool.starmap 为我工作。