ProfileOptions profile_cpu=True 的数据流作业不写入配置文件答案

【问题标题】：Dataflow job with ProfileOptions profile_cpu=True not writing profile filesProfileOptions profile_cpu=True 的数据流作业不写入配置文件
【发布时间】：2021-03-18 15:58:07
【问题描述】：

我正在尝试分析在 Apache Beam Python 3.7 SDK 2.27.0 上运行的 Dataflow Pipeline 作业的 CPU 使用率。我使用 --profile_cpu 和 profile_location 参数集触发了作业，并且可以看到它们是在 Dataflow 控制台中设置的：

Dataflow Pipeline Options showing that profile_cpu and profile_location are set.

但是，在作业完成后，没有文件写入profile_location GSC 存储桶。

使用jsonPayload.logger:"apache_beam.utils.profiler:profiler.py" 查看数据流日志时，我可以看到“开始分析”和“停止分析”的日志：

Logs showing the "Start profiling" and "Stop profiling" messages from the Profiler.

但没有对应于“将分析器数据复制到：”步骤的日志，即使 profile_location 已在 ProfilingOptions 中设置，因此应在 Profiler 上设置。任何有关可能出现问题的建议，或了解当前是否支持此功能都会非常有帮助。

【问题讨论】：

您可以尝试使用 Dataflow Runner v2 运行吗？ cloud.google.com/dataflow/docs/guides/…
是的，这行得通，谢谢！

标签： apache-beam dataflow

【解决方案1】：

这已通过使用--experiments=use_runner_v2 标志解决。看起来这仅在 Dataflow Runner v2 上受支持，该版本尚未作为默认运行器推出。

【讨论】：