【问题标题】:Monitor progress of dd.DataFrame.apply监控 dd.DataFrame.apply 的进度
【发布时间】:2018-05-14 01:24:39
【问题描述】:
如何监控逐行 Dask DataFrame 应用操作的进度?
用ProgressBar() 换行似乎没有任何作用,即控制台上没有打印任何内容?
from dask.diagnostics import ProgressBar
with ProgressBar():
df_calc = ddf.apply(myfunc, axis=1)
【问题讨论】:
标签:
dataframe
parallel-processing
progress-bar
monitoring
dask
【解决方案1】:
默认情况下,Dask 操作是惰性的。仅当您调用 compute 或 persist 时才会进行计算。
df = dd.read_csv(...) # This lazily builds up a computation
df = df[df.name == 'alice'] # This lazily builds up a computation
result = df.amount.sum() # This lazily builds up a computation
result = result.compute() # This triggers actual work