【发布时间】:2020-01-27 22:55:43
【问题描述】:
我无法使用 python 3.7 暂存云数据流模板。它在 apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible 的一个参数化参数上失败
使用 python 2.7 暂存模板可以正常工作。
我已尝试使用 3.7 运行数据流作业,它们运行良好。只有模板暂存被破坏。 数据流模板中仍然不支持 python 3.7 还是 python 3 中的暂存语法发生了变化?
这是管道部分
class WordcountOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument(
'--input',
default='gs://dataflow-samples/shakespeare/kinglear.txt',
help='Path of the file to read from',
dest="input")
def main(argv=None):
options = PipelineOptions(flags=argv)
setup_options = options.view_as(SetupOptions)
wordcount_options = options.view_as(WordcountOptions)
with beam.Pipeline(options=setup_options) as p:
lines = p | 'read' >> ReadFromText(wordcount_options.input)
if __name__ == '__main__':
main()
这里是带有暂存脚本https://github.com/firemuzzy/dataflow-templates-bug-python3的完整回购
以前有类似的问题,但不确定它的相关性,因为这是在 python 2.7 中完成的,但我的模板阶段在 2.7 中很好,但在 3.7 中失败
How to create Google Cloud Dataflow Wordcount custom template in Python?
**** 堆栈跟踪 ****
Traceback (most recent call last):
File "run_pipeline.py", line 44, in <module>
main()
File "run_pipeline.py", line 41, in main
lines = p | 'read' >> ReadFromText(wordcount_options.input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 906, in __ror__
return self.transform.__ror__(pvalueish, self.label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
result = p.apply(self, pvalueish, label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 490, in apply
return self.apply(transform, pvalueish)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 542, in expand
return pvalue.pipeline | Read(self._source)
File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/ptransform.py", line 515, in __ror__
result = p.apply(self, pvalueish, label)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1020, in apply_Read
return self.apply_PTransform(transform, pbegin, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 863, in expand
return pbegin | _SDFBoundedSourceWrapper(self.source)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pvalue.py", line 113, in __or__
return self.pipeline.apply(ptransform, self)
File "/usr/local/lib/python3.7/site-packages/apache_beam/pipeline.py", line 525, in apply
pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 183, in apply
return m(transform, input, options)
File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/runner.py", line 189, in apply_PTransform
return transform.expand(input)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1543, in expand
| core.ParDo(self._create_sdf_bounded_source_dofn()))
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1517, in _create_sdf_bounded_source_dofn
estimated_size = source.estimate_size()
File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 136, in _f
raise error.RuntimeValueProviderError('%s not accessible' % obj)
apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: input, type: str, default_value: 'gs://dataflow-samples/shakespeare/kinglear.txt') not accessible
【问题讨论】:
-
你能显示堆栈跟踪吗?
-
@Pablo 我在帖子中添加了堆栈跟踪。链接的 github 存储库包含所有内容,包括堆栈跟踪和重现问题的所有代码。
-
@mlablablab 你有没有遵循任何文档/教程?
-
@muscat 我遵循了谷歌的模板说明cloud.google.com/dataflow/docs/guides/templates/… 并使用 python 2 部署了多个模板。但是,一旦我切换到 python 3 登台失败。您可以在链接的 github 存储库中查看我的简化示例。 Python 2 不是一个选项,因为我需要使用仅在 Python 3 中工作的库。要么我做错了,我真的没有注意到它,要么数据流模板有问题。无论哪种方式,我都很糟糕。
-
也许这几天前就开始失败了?我看到 requirements.txt 没有请求特定版本。可能是 Beam 2.18.0 上的模板被破坏了。您能否尝试将依赖项定义为
apache-beam[gcp]<2.18.0?
标签: python python-3.x google-cloud-dataflow apache-beam