【问题标题】:RuntimeValueProviderError when creating a google cloud dataflow template with Apache Beam python使用 Apache Beam python 创建谷歌云数据流模板时出现 RuntimeValueProviderError
【发布时间】:2020-01-27 22:55:43
【问题描述】:

我无法使用 python 3.7 暂存云数据流模板。它在 apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible 的一个参数化参数上失败

使用 python 2.7 暂存模板可以正常工作。

我已尝试使用 3.7 运行数据流作业,它们运行良好。只有模板暂存被破坏。 数据流模板中仍然不支持 python 3.7 还是 python 3 中的暂存语法发生了变化?

这是管道部分

class WordcountOptions(PipelineOptions):
  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_value_provider_argument(
      '--input',
      default='gs://dataflow-samples/shakespeare/kinglear.txt',
      help='Path of the file to read from',
      dest="input")

def main(argv=None):
  options = PipelineOptions(flags=argv)
  setup_options = options.view_as(SetupOptions)

  wordcount_options = options.view_as(WordcountOptions)

  with beam.Pipeline(options=setup_options) as p:
    lines = p | 'read' >> ReadFromText(wordcount_options.input)

if __name__ == '__main__':
  main()

这里是带有暂存脚本https://github.com/firemuzzy/dataflow-templates-bug-python3的完整回购

以前有类似的问题,但不确定它的相关性,因为这是在 python 2.7 中完成的,但我的模板阶段在 2.7 中很好,但在 3.7 中失败

How to create Google Cloud Dataflow Wordcount custom template in Python?

**** 堆栈跟踪 ****

Traceback (most recent call last):
  File "run_pipeline.py", line 44, in <module>
    main()
  File "run_pipeline.py", line 41, in main
    lines = p | 'read' >> ReadFromText(wordcount_options.input)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 906, in __ror__
    return self.transform.__ror__(pvalueish, self.label)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 490, in apply
    return self.apply(transform, pvalueish)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
    return m(transform, input, options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
    return transform.expand(input)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 542, in expand
    return pvalue.pipeline | Read(self._source)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
    result = p.apply(self, pvalueish, label)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
    return m(transform, input, options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1020, in apply_Read
    return self.apply_PTransform(transform, pbegin, options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
    return transform.expand(input)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 863, in expand
    return pbegin | _SDFBoundedSourceWrapper(self.source)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/pvalue.py", line 113, in __or__
    return self.pipeline.apply(ptransform, self)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
    pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
    return m(transform, input, options)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
    return transform.expand(input)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1543, in expand
    | core.ParDo(self._create_sdf_bounded_source_dofn()))
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1517, in _create_sdf_bounded_source_dofn
    estimated_size = source.estimate_size()
  File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 136, in _f
    raise error.RuntimeValueProviderError('%s not accessible' % obj)
apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible

【问题讨论】:

  • 你能显示堆栈跟踪吗?
  • @Pablo 我在帖子中添加了堆栈跟踪。链接的 github 存储库包含所有内容,包括堆栈跟踪和重现问题的所有代码。
  • @mlablablab 你有没有遵循任何文档/教程?
  • @muscat 我遵循了谷歌的模板说明cloud.google.com/dataflow/docs/guides/templates/… 并使用 python 2 部署了多个模板。但是,一旦我切换到 python 3 登台失败。您可以在链接的 github 存储库中查看我的简化示例。 Python 2 不是一个选项,因为我需要使用仅在 Python 3 中工作的库。要么我做错了,我真的没有注意到它,要么数据流模板有问题。无论哪种方式,我都很糟糕。
  • 也许这几天前就开始失败了?我看到 requirements.txt 没有请求特定版本。可能是 Beam 2.18.0 上的模板被破坏了。您能否尝试将依赖项定义为apache-beam[gcp]&lt;2.18.0

标签: python python-3.x google-cloud-dataflow apache-beam


【解决方案1】:

不幸的是,Apache Beam 的 Python SDK 2.18.0 上的模板似乎已损坏。

目前,解决方案是避免使用 Beam 2.18.0,因此在您的需求/依赖项中,定义 apache-beam[gcp]&lt;2.18.0apache-beam[gcp]&gt;2.18.0

【讨论】:

  • 我已将提交的梁错误更新为 2.18.0 问题 issues.apache.org/jira/browse/BEAM-9218
  • 太棒了。非常感谢您的详细报告!
  • 将 apache-beam[gcp] 的版本更改为 2.17.0 对我有用。谢谢!
  • 上述jira问题现已修复。您可以升级到 2.20.0,而不是降级。
猜你喜欢
  • 2021-05-16
  • 2018-07-20
  • 2021-05-08
  • 1970-01-01
  • 2019-09-16
  • 1970-01-01
  • 2020-12-07
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多