【问题标题】:Dataflow jobs fail with: Shuffle close failed: FAILED_PRECONDITION: Precondition check failed数据流作业失败:随机关闭失败:FAILED_PRECONDITION:前提条件检查失败
【发布时间】:2018-10-17 00:58:24
【问题描述】:

我的 Dataflow 作业失败并出现以下错误:

INFO:root:2018-10-15T18:55:37.417Z: JOB_MESSAGE_ERROR: Workflow failed. 
Causes: S17:fold2/Write/WriteImpl/WindowInto(WindowIntoFn)+write instances fold2/Write/WriteImpl/GroupByKey/Reify+write instances fold2/Write/WriteImpl/GroupByKey/Write failed., 
A work item was attempted 4 times without success. 
Each time the worker eventually lost contact with the service. The work item was attempted on: 
  yuri-nine-gag-recommender-10151140-3kmq-harness-mdgd,
  yuri-nine-gag-recommender-10151140-3kmq-harness-mdgd,
  yuri-nine-gag-recommender-10151140-3kmq-harness-41dd,
  yuri-nine-gag-recommender-10151140-3kmq-harness-mdgd

挖掘日志只显示一个错误:

An exception was raised when trying to execute the workitem 6479210647275353150 : 
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work work_executor.execute() 
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 158, in execute op.finish() 
File "dataflow_worker/shuffle_operations.py", line 144, in dataflow_worker.shuffle_operations.ShuffleWriteOperation.finish def finish(self): 
File "dataflow_worker/shuffle_operations.py", line 145, in dataflow_worker.shuffle_operations.ShuffleWriteOperation.finish with self.scoped_finish_state: 
File "dataflow_worker/shuffle_operations.py", line 147, in dataflow_worker.shuffle_operations.ShuffleWriteOperation.finish self.writer.__exit__(None, None, None) 
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 599, in __exit__ self.writer.Close() 
File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 202, in shuffle_client.PyShuffleWriter.Close IOError: Shuffle close failed: FAILED_PRECONDITION: Precondition check failed.

有什么想法吗?

【问题讨论】:

  • 你能提供一个工作ID吗?我可以调查一下。
  • 感谢您的帮助!我已经发现了问题,但不幸的是,这次数据流日志没有用:)
  • @CharlesChen 嗨,查尔斯,问题又回来了。你能看看我的一份工作吗?

标签: google-cloud-dataflow apache-beam


【解决方案1】:

我终于通过删除各种代码片段、打印大量日志并再次运行作业来解决问题。事实证明,我有一个正则表达式会因为一个特定的条目而崩溃。不幸的是,Dataflow 日志根本没有帮助。

【讨论】:

  • 谢谢——根本原因是什么,什么样的日志在这里最有帮助?这是内存不足的问题吗?
  • @CharlesChen 没有迹象表明工人失败的原因,但可能是内存不足的问题。我可以提供工作ID,也许您可​​以挖掘并查看根本原因?从那里也许您可以看到哪些日志最有帮助?
猜你喜欢
  • 2019-12-02
  • 2018-02-07
  • 1970-01-01
  • 2020-08-05
  • 1970-01-01
  • 2012-12-16
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多