【问题标题】:Dataflow job stucks while reading 35,000th file from google cloud storage从谷歌云存储读取第 35,000 个文件时数据流作业卡住
【发布时间】:2019-05-08 08:12:34
【问题描述】:
class Mp3_to_npyFn(beam.DoFn):
    def process(self, element):
        filename, e = element

        # get mp3 from the storage
        bucket = storage.Client().get_bucket('BUCKET_NAME')
        blob = bucket.get_blob(filename)
        tmp_mp3 = TemporaryFile()
        blob.download_to_file(tmp_mp3)
        tmp_mp3.seek(0) 

        array = do_something(tmp_mp3)
        write_numpy_array(array)
        return something

def run():
    pp = beam.Pipeline(RUNNER,options=opts)
    l = (pp
         | 'Read TSV' >> ReadFromText(INPUT_TSV, skip_header_lines=1) 
         | 'Parse TSV' >> beam.Map(parseTSV) 
         | 'MP3 to npy' >> beam.ParDo(Mp3_to_npyFn())
        )
    job = pp.run()
    job.wait_until_finish()

Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/my_db?projection=noAcl: Backend Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work
    work_executor.execute()
  File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 176, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
  File "apache_beam/runners/worker/operations.py", line 246, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 785, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 422, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 142, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 560, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 561, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 740, in apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 746, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 800, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback
    raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 744, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 423, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "main2_mod.py", line 57, in process
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/client.py", line 227, in get_bucket
    bucket.reload(client=self)
  File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_helpers.py", line 130, in reload
    _target_object=self,
  File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 293, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.InternalServerError: 500 GET https://www.googleapis.com/storage/v1/b/cochlear_db?projection=noAcl: Backend Error [while running 'MP3 to npy']

tsv 文件包含 0.4M 文件名 (.mp3) 的列表。解析后,它会读取每个 mp3 文件并进行一些处理。当我用 tsv 中的 5 个文件列表进行测试时,它工作正常。但是使用 0.4M 文件进行测试,它停留在 读取第 35,000 个文件,错误 500。似乎重试了很多次,最终失败。

仅供参考,mp3 文件位于“gs://bucket_name/same_subdir/id_string.mp3”中,其中 id 的顺序为 100001,100002,100003。

【问题讨论】:

    标签: google-cloud-storage google-cloud-dataflow


    【解决方案1】:

    我通过在管道中明确提供身份验证凭据解决了这个问题。在我的猜测中,工作人员在失败后重试时会失去权限。

    # get mp3 from the storage
        credentials = compute_engine.Credentials()
        project = <PROJECT_NAME>
    
        client = storage.Client(credentials=credentials, project=project)
        bucket = client.get_bucket(<BUCKET_NAME>)
    

    【讨论】:

      【解决方案2】:

      请使用GcsIO 而不是存储客户端。 请重试您的调用,对于可重试错误,请使用指数级backoff

      【讨论】:

      • 感谢您的建议!我解决了这个问题(见我的回答)。我也会试试你的答案!!
      • 我试过了,但在这种情况下,我在自动缩放时遇到了问题。目前,我的回答效果很好。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-02-06
      • 1970-01-01
      • 1970-01-01
      • 2022-08-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多