使用 Python 和 Windows 进行多处理答案

【问题标题】：Multiprocessing with Python and Windows使用 Python 和 Windows 进行多处理
【发布时间】：2017-12-01 07:36:14
【问题描述】：

我有一个在 python 中与 Thread 一起工作的代码，但我想切换到 Process，就好像我已经很好地理解了那样，这将使我加快速度。这是带有线程的代码：

threads.append(Thread(target=getId, args=(my_queue, read)))
threads.append(Thread(target=getLatitude, args=(my_queue, read)))

代码可以将返回值放入队列中，加入线程列表后，我可以检索结果。更改代码和导入语句我的代码现在是这样的：

threads.append(Process(target=getId, args=(my_queue, read)))
threads.append(Process(target=getLatitude, args=(my_queue, read)))

但是它不执行任何操作并且队列为空，而线程队列不为空，所以我认为它与进程有关。我已经阅读了 Process 类在 Windows 上不起作用的答案是真的，还是有办法让它工作（添加 freeze_support() 没有帮助）？反之，windows上的多线程实际上是在不同的核上并行执行的？

参考：

Python multiprocessing example not working

Python code with multiprocessing does not work on Windows

Multiprocessing process does not join when putting complex dictionary in return queue （其中描述了Windows上不存在fork）

编辑：添加一些细节：带有 Process 的代码实际上是在 centOS 上运行的。

编辑2：添加我的代码的简化版本，带有进程，在 centOS 上测试过的代码

import pandas as pd
from multiprocessing import Process, freeze_support
from multiprocessing import Queue

#%% Global variables

datasets = []

latitude = []

def fun(key, job):
    global latitude
    if(key == 'LAT'):
        latitude.append(job)

def getLatitude(out_queue, skip = None):
    latDict = {'LAT' : latitude}
    out_queue.put(latDict)

n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
print("Number of baboon:" + str(n))

read = []

for i in range(0,n):
    threads = []
    my_queue = Queue()
    threads.append(Process(target=getLatitude, args=(my_queue, read)))

    for t in threads:
        freeze_support() # try both with and without this line
        t.start()

    for t in threads:
        t.join()

    while not my_queue.empty():
        try:
            job = my_queue.get()
            key = list(job.keys())
            fun(key[0],job[key[0]])
        except:
            print("END")  

    read.append(i)

【问题讨论】：

Process 在 Windows 上工作，但显示一个完整的代码示例来重现该问题。有关指南，请参阅 minimal reproducible example。
将代码添加到问题中。 @MarkTolonen

标签： python multithreading python-multiprocessing python-multithreading

【解决方案1】：

根据文档，您需要在函数定义之后执行以下操作。当 Python 创建子进程时，它们会导入您的脚本，因此在全局级别运行的代码将运行多次。对于你只想在主线程中运行的代码：

if __name__ == '__main__':
    n = pd.read_csv("my.csv", sep =',', header = None).shape[0]
    # etc.

缩进 if 下的其余代码。

【讨论】：