使用 ThreadPoolExecutor 时记录线程答案

【问题标题】：Logging threads when using ThreadPoolExecutor使用 ThreadPoolExecutor 时记录线程
【发布时间】：2020-07-02 18:28:11
【问题描述】：

我正在使用 python 的concurrent.futures 中的ThreadPoolExecutor 来并行化抓取并将结果写入数据库。这样做时，我意识到如果其中一个线程失败，我不会得到任何信息。 我怎样才能正确地知道哪些线程失败以及为什么失败（所以使用“正常”回溯）？下面是一个最小的工作示例。

import logging
logging.basicConfig(format='%(asctime)s  %(message)s', 
    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)
from concurrent.futures import ThreadPoolExecutor

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

# uncomment next line, and you will get the usual traceback
# worker_bee(1)

# ThreadPoolExecutor will not provide any traceback
logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    for seed in range(0,10):
        executor.submit(worker_bee, seed)
    logging.info(f'submitted, waiting for threads to finish')

如果我在worker_bee() 中导入日志并将消息定向到根记录器，我可以在最终日志中看到这些消息。但是我只能看到我定义的日志消息，而不是代码实际失败的追溯。

【问题讨论】：

标签： python python-multithreading

【解决方案1】：

您可以通过从executor.submit() 检索结果来获得“正常回溯”。这将允许一些时间过去，线程开始执行（并且可能失败）。

这就是我的意思：

from concurrent.futures import ThreadPoolExecutor
import logging

logging.basicConfig(format='%(asctime)s  %(message)s',
                    datefmt='%y-%m-%d %H:%M:%S', level=logging.INFO)

def worker_bee(seed):
    # sido is not defined intentionally to break the code
    result = seed + sido
    return result

logging.info('submitting all jobs to the queue')
with ThreadPoolExecutor(max_workers=4) as executor:
    results = []
    for seed in range(10):
        result = executor.submit(worker_bee, seed)
        results.append(result)
    logging.info(f'submitted, waiting for threads to finish')

for result in results:
    print(result.result())

输出：

20-03-21 16:21:24  submitting all jobs to the queue
20-03-21 16:21:24  submitted, waiting for threads to finish
Traceback (most recent call last):
  File "logging-threads-when-using-threadpoolexecutor.py", line 24, in <module>
    print(result.result())
  File "C:\Python3\lib\concurrent\futures\_base.py", line 432, in result
    return self.__get_result()
  File "C:\Python3\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
  File "C:\Python3\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "logging-threads-when-using-threadpoolexecutor.py", line 12, in worker_bee
    result = seed + sido
NameError: name 'sido' is not defined

【讨论】：

我认为print(result.result()) 需要在with 上下文下缩进，因为它只在那里定义
@Kareem：不，这不是必需的，因为with 语句不会引入新的范围。但你的评论确实让我注意到了别的东西......
@martineau：谢谢，这至少是开发阶段的东西！由于 python 文档提到 [Future instances] (docs.python.org/3/library/…) '不应该直接创建，除非测试'，我有点害羞在生产环境中使用它。我已经看到了一些在生产环境中不受欢迎的行为，这意味着异常不会出现在标准记录器中，但也会出现在命令行中。我可以想象的另一个问题可能是提交数百万个线程时的内存。
我想你很困惑。 executor.submit() 返回一个“结果”对象，而不是一个未来的实例，保存该对象然后调用其result() 方法以实际获取线程返回的值是一种常见的习惯用法。另请注意，如果您希望您的线程调用记录器，您需要为其创建一个Lock，以防止多个线程同时使用它。
@martineau：感谢您的澄清。第二次阅读文档，当通过executor.submit() 创建时，没有什么反对将未来对象保存到列表中（它根据定义返回一个未来对象）。备注： print 语句已过时，调用result() 方法已经引发异常。在logging.basicConfig( ) 中指定filename='mylog.log' 时，异常仍将仅显示在命令行中，而不显示在mylog.log 中。您是否知道如何将其导入日志文件或背后的机制是什么？