【问题标题】:Why am I getting dask warnings when running a pandas operation?为什么我在运行 pandas 操作时会收到 dask 警告?
【发布时间】:2018-03-10 07:08:39
【问题描述】:

我有一个包含 pandas 和 dask 操作的笔记本。

当我还没有启动客户端时,一切都按预期进行。但是,一旦我启动 dask.distributed 客户端,我就会在运行 pandas 操作的单元格中收到警告,例如pd.read_parquet('my_file')

当我开始工人时,我得到的保姆线的数量。

警告示例:

distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.26s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.38s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Scheduler for 1.37s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
distributed.core - WARNING - Event loop was unresponsive in Nanny for 1.36s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.

我想知道为什么,以及如何让他们停下来。

【问题讨论】:

    标签: dask dask-distributed


    【解决方案1】:

    此警告意味着 Dask 工作进程很长时间没有响应。这很糟糕,因为工作人员无法向其他工作人员提供数据,无法与调度程序通信等。即使在运行计算时,这也是不正常的,因为这些计算是在单独的线程中运行的。

    这个问题有两个主要原因:

    1. 您的任务运行不释放 GIL 的函数。这在当今很少见(大多数 pandas 操作都会发布 GIL),但可能会发生。我相信 read_parquet 的所有变体都发布了 GIL
    2. 如果这种情况只发生一次且仅在启动时发生,那么这是一个已在 distributed.__version__ == '1.21.3' 周围修复的错误。您可能想要升级。

    您还可以通过增加 ~/.dask/config.yaml 文件中允许的最大滴答时间来消除警告

    tick-maximum-delay: 10 s
    

    【讨论】:

    • 关于第 2 点:我确实得到了 'dask.distributed' has no attribute '__version__' 。哪个版本的dask有这个bug?
    • import distributed; print(distributed.__version__)
    • 我明白了,distributed 也可以作为独立包提供。我很困惑,因为还有dask.distributed
    • 你知道找出哪个函数没有释放 GIL 的好方法吗?如果这是同步代码,抛出异常而不是日志消息就可以了,但由于它是异步的,我不知道谁在占用那个时间。
    • 一个人可以分析有多少数据正在传递?
    猜你喜欢
    • 2022-01-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-07-27
    • 1970-01-01
    相关资源
    最近更新 更多