Pool.map() 在只运行第一批后卡住了答案

【问题标题】：Pool.map() got stuck after running only first batchPool.map() 在只运行第一批后卡住了
【发布时间】：2021-01-20 19:58:09
【问题描述】：

我对 python 比较陌生，对多线程和多处理也很陌生。我一直在尝试通过基于 Web 的 API 将数千个值（大约 70,000 个）发送到块中，并希望它返回与所有这些值相关联的数据。 API 一次可以接受 50 个值，所以现在作为测试，我有 100 个值，我想发送 2 个 50 个值的块。如果没有多线程，我需要几个小时才能完成这项工作，所以我尝试使用多线程来提高性能。

问题： 在pool.map() 部分仅执行一个任务（第一行，甚至是标题，甚至不是主要值）后，代码就卡住了，我不得不重新启动笔记本核心。我听说不要在笔记本上使用多处理，所以我在 Spyder 上编写了整个代码并运行它，但还是一样。代码如下：

#create df data frame with
#some codes to get df of 100 values in 
#2 chunks, each chunk contains 50 values.

output:
df =                                                  VAL
0  1166835704;1352357565;544477351;159345951;22...
1  354236462063;54666246046;13452466248...

def get_val(df):
    data = []
    v_list = df
    s = requests.Session()
    url = 'https://website/'
    post_fields = {'format': 'json', 'data':v_list}
    r = s.post(url, data=post_fields)
    d = json.loads(r.text)
    sort = pd.json_normalize(d, ['Results'])
    return sort

if __name__ == "__main__":
    pool = ThreadPool(4)  # Make the Pool of workers
    results = pool.map(get_val, df) #Open the df in their own threads
    pool.close() #close the pool and wait for the work to finish 
    pool.join()

任何建议都会有所帮助。谢谢！

【问题讨论】：

你能提供一个minimal reproducible example吗？

标签： python multithreading python-requests multiprocessing threadpool

【解决方案1】：

你能用下面的方法检查一次吗

with ThreadPool(4) as pool:   
        results= pool.map(get_val, df)  #df should be iterable.
print(results)

另外，请检查是否可以将块大小传递给线程池，因为这会影响性能。

【讨论】：