【问题标题】:Error converting FaceNet model into ONNX format将 FaceNet 模型转换为 ONNX 格式时出错
【发布时间】:2020-11-23 21:36:36
【问题描述】:

系统信息

  • 操作系统平台和发行版:Linux Ubuntu 19.10
  • Tensorflow 版本:1.15
  • Python 版本:3.7

问题

我从page 下载了 FaceNet 的 tensorflow 模型,我正在尝试将其从 .pb 转换为 .onnx 文件,但它会引发以下错误:

复制

root@xesk-VirtualBox:/home/xesk/Desktop# python -m tf2onnx.convert --saved-model home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb --output model.onnx

    2020-08-03 20:18:05.081538: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
    2020-08-03 20:18:05.081680: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
    2020-08-03 20:18:07,431 - WARNING - '--tag' not specified for saved_model. Using --tag serve
    Traceback (most recent call last):
    File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "/usr/lib/python3.8/runpy.py", line 86, in _run_code
    exec(code, run_globals)
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/convert.py", line 171, in
    main()
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/convert.py", line 131, in main
    graph_def, inputs, outputs = tf_loader.from_saved_model(
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/tf_loader.py", line 288, in from_saved_model
    _from_saved_model_v2(model_path, input_names, output_names, tag, signatures, concrete_function)
    File "/usr/local/lib/python3.8/dist-packages/tf2onnx/tf_loader.py", line 247, in _from_saved_model_v2
    imported = tf.saved_model.load(model_path, tags=tag) # pylint: disable=no-value-for-parameter
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 603, in load
    return load_internal(export_dir, tags, options)
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/load.py", line 614, in load_internal
    loader_impl.parse_saved_model_with_debug_info(export_dir))
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 56, in parse_saved_model_with_debug_info
    saved_model = _parse_saved_model(export_dir)
    File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 110, in parse_saved_model
    raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
    OSError: SavedModel file does not exist at: home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb/{saved_model.pbtxt|saved_model.pb}

附加上下文

我没有运行任何 CUDA 或类似软件,只有 CPU。 下载的模型是20180402-114759。 这是我第一次使用这个工具,而且我在这个 AI 世界里有点初学者,所以我可能会遗漏一些明显的东西。 当然,我检查了几次路径和命令语法。 可能与我下载的文件格式有关?

编辑

按照 Venkatesh Wadawadagi 的回答,我选择选项 1。 修改.meta文件名解决了脚本无法识别的问题。

脚本运行或多或少正确,并完成了 export_dir 目录的创建,其中包含 export_dir > 0 > variables 子文件夹。但是,它们是空的。

控制台输出是这样的:

xesk@xesk:~/Desktop/UP2S/ACROMEGALLY/20180402-114759$ python3 ./pb2sm
2020-08-10 16:02:26.128846: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-10 16:02:26.129114: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-10 16:02:26.129137: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (xesk): /proc/driver/nvidia/version does not exist
2020-08-10 16:02:26.129501: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-08-10 16:02:26.139076: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2020-08-10 16:02:26.139506: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x44018d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-10 16:02:26.139520: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/queue_runner_impl.py:391: QueueRunner.__init__ (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
2020-08-10 16:02:32.681265: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17676288 exceeds 10% of system memory.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta/Adam
     [[{{node save/SaveV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./pb2sm", line 17, in <module>
    strip_default_attrs=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 595, in add_meta_graph_and_variables
    saver.save(sess, variables_path, write_meta_graph=False, write_state=False)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1193, in save
    raise exc
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 1176, in save
    {self.saver_def.filename_tensor_name: checkpoint_file})
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value InceptionResnetV1/Block8/Branch_0/Conv2d_1x1/BatchNorm/beta/Adam
     [[node save/SaveV2_1 (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'save/SaveV2_1':
  File "./pb2sm", line 17, in <module>
    strip_default_attrs=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 589, in add_meta_graph_and_variables
    saver = self._maybe_create_saver(saver)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/saved_model/builder_impl.py", line 227, in _maybe_create_saver
    allow_empty=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 828, in __init__
    self.build()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 840, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 878, in _build
    build_restore=build_restore)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 499, in _build_internal
    save_tensor = self._AddShardedSaveOps(filename_tensor, per_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 291, in _AddShardedSaveOps
    return self._AddShardedSaveOpsForV2(filename_tensor, per_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 265, in _AddShardedSaveOpsForV2
    sharded_saves.append(self._AddSaveOps(sharded_filename, saveables))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 206, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/saver.py", line 122, in save_op
    tensors)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1946, in save_v2
    name=name)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

我是否可能缺少一些要安装的库? 似乎与某些 CUDA 实现有关,我没有。 有可能吗?

【问题讨论】:

    标签: python tensorflow onnx facenet


    【解决方案1】:

    您正在使用的命令:

    python -m tf2onnx.convert --saved-model home/xesk/Desktop/2s/20180402-114759/20180402-114759.pb --output model.onnx
    

    请注意,您使用的 Facenet 训练模型只有冻结图(.pb 文件)和检查点(.ckpt),并且没有您的命令的 saved-model正在寻找。

    所以基本上你传递的是冻结图的.pb 文件的路径,这与 SavedModel.pb 文件不同(你没有)。保存的模型将具有 variables 文件夹以及 saved_model.pb 文件。

    这就是错误的原因:

    OSError: SavedModel file does not exist
    

    阅读更多关于 SavedModel here.

    要继续进行 ONNX 转换,您有两种选择:

    1. 将检查点转换为 SavedModel:

    为此使用以下code

    import os
    import tensorflow as tf
    
    trained_checkpoint_prefix = 'model-20180402-114759.ckpt-275'
    export_dir = os.path.join('export_dir', '0')
    
    graph = tf.Graph()
    with tf.compat.v1.Session(graph=graph) as sess:
        # Restore from checkpoint
        loader = tf.compat.v1.train.import_meta_graph(trained_checkpoint_prefix + '.meta')
        loader.restore(sess, trained_checkpoint_prefix)
    
        # Export checkpoint to SavedModel
        builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
        builder.add_meta_graph_and_variables(sess,
                                             [tf.saved_model.TRAINING, tf.saved_model.SERVING],
                                             strip_default_attrs=True)
        builder.save() 
    

    注意: .data.index.meta 应具有相同的前缀,然后此代码将起作用。所以重命名.meta文件。

    mv model-20180402-114759.meta model-20180402-114759.ckpt-275.meta
    

    例如:

    1. 利用ckpt文件或frozen-graph.pb进行onnx转换

    来自检查点格式:

    python -m tf2onnx.convert --checkpoint tensorflow-model-meta-file-path --output model.onnx --inputs input0:0,input1:0 --outputs output0:0
    

    来自 graphdef/frozen-graph 格式:

    python -m tf2onnx.convert --graphdef tensorflow-model-graphdef-file --output model.onnx --inputs input0:0,input1:0 --outputs output0:0
    

    如果您的 TensorFlow 模型的格式不是 saved model,那么您需要提供 inputsoutputs > 模型图。

    来自this

    如果您的模型是 checkpointgraphdef 格式并且您不知道 模型的输入和输出节点,可以使用 summarize_graph TensorFlow 实用程序。 summarize_graph 工具可以 需要从源代码下载和构建。如果您可以选择 转到您的模型提供者并在保存的模型中获取模型 格式,那么我们建议您这样做。

    【讨论】:

    • 我之前尝试过使用option2,但找不到模型的--inputs和--outputs。对于选项 1,我使用 python3.7 和 tensorflow1.15 创建了一个新环境,因为这些是 tf2onnx 工作的要求。使用该设置,我收到此错误:... in loader.restore(sess, trained_checkpoint_prefix) ... ValueError: The passed save_path is not a valid checkpoint: model-20180402-114759
    • 试图在trained_checkpoint_prefix 中添加.meta 扩展名 好像能多工作一点,但最后还是报错:tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ./model-20180402-114759.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? [[node save/RestoreV2_356 (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
    • 我在同一个文件夹中运行脚本,我有 .pb 和 .meta 文件
    • @Xesk 我编辑了 Option-1 来解决您的问题。 注意code (trained_checkpoint_prefix) 中的更改并阅读Note 部分。我建议您仅尝试 option-1 并确保您不会收到 not a valid filemissing file 等错误,否则它总是与前缀和路径相关的问题。就像我在回答中提到的那样,要尝试选项 2,您需要知道需要使用 summarize_graph 的输入和输出。
    • 感谢您的时间和耐心。我已经用你的更新编辑了这个问题,以及你所说的运行它的输出。这看起来是一个很大的进步,但我仍然遇到一些错误......可能是由于缺少一些库?
    猜你喜欢
    • 2019-05-04
    • 1970-01-01
    • 2021-05-12
    • 2020-03-31
    • 2020-02-12
    • 2022-10-13
    • 2023-01-31
    • 2023-03-15
    • 2018-10-05
    相关资源
    最近更新 更多