【问题标题】:Tensorflow Object detection training job fails on Google cloud谷歌云上的 TensorFlow 对象检测训练作业失败
【发布时间】:2017-12-15 05:36:15
【问题描述】:

我有以下方式的 Google 存储桶:

-data
--labels.pbtxt
--train.record
--test.record
-training
--config file
--packages

我的本​​地机器以同样的方式在 /tensorflow/models/research/object_detection 中保存数据,另外

-training
--cloud.yml

我正在运行以下命令在 google cloud ML 引擎上开始工作

gcloud ml-engine jobs submit training object_detection_0.1 --job-
dir=gs://{BUCKET NAME}/training --packages dist/object_detection-
0.1.tar.gz,slim/dist/slim-0.1.tar.gz --module-name object_detection.train --
region us-central1 --config /##/##/models/research/object_detection/training 
-- --train_dir=gs://{BUCKET NAME}/training --
pipeline_config_path=gs://{BUCKET NAME}/training/config_file.config

Google 云日志显示以下错误。

Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
 File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
 packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

replica worker 0,1,2,3 - 同样的错误

The replica worker 4 exited with a non-zero status of 1. Termination reason: 
Error. 
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

副本 ps 0,1 - 相同的错误

 The replica ps 2 exited with a non-zero status of 1. Termination reason: 
Error. 
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", 
line 49, in <module>
    from object_detection import trainer
  File "/root/.local/lib/python2.7/site-
packages/object_detection/trainer.py", line 33, in <module>
    from deployment import model_deploy
ImportError: No module named deployment

【问题讨论】:

    标签: linux tensorflow google-cloud-platform object-detection-api


    【解决方案1】:

    我在使用 deeplab 模型时遇到了同样的问题。似乎他们指的是this folder,因为如果我放置它应该被正确调用,它对我有用

    顺便说一句...我让我知道你是如何解决它的。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-02-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-06-05
      • 2018-12-07
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多