gcloud ml-engine local predict --text-instances 失败并出现“无法解析”错误答案

【问题标题】：gcloud ml-engine local predict --text-instances fails with "Could not parse" errorgcloud ml-engine local predict --text-instances 失败并出现“无法解析”错误
【发布时间】：2017-09-07 11:22:53
【问题描述】：

我正在尝试让 tensorflow boston 样本 (https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/tutorials/input_fn) 在 google cloudml 上运行，而且我似乎在训练中取得了成功，但我在随后的预测中遇到了困难。

我已经调整了代码以适应 tf.contrib.learn.Experiment 和 learn_runner.run()。它通过“gcloud ml-engine local train ...”/“gcloud ml-engine jobs submit training ...”在本地和云端运行。
我可以使用经过训练的模型运行 estimator.predict(input_fn=predict_input_fn)) 并使用给定的 boston_predict.csv 集获得有意义的预测。
我可以使用“gcloud ml-engine models create ...”和“gcloud ml-engine versions create ...”在云中创建模型并对其进行版本控制

但是

通过“gcloud ml-engine local predict --model-dir=/export/Servo/XXX --text-instances boston_predict.csv”的本地预测失败，并出现“InvalidArgumentError（请参见上文的回溯）：无法解析示例输入 <..>（错误代码：2）。请参阅下面的成绩单。它与无标题 boston_predict.csv 类似地失败。

我已经使用“$ gcloud ml-engine local predict --help "，阅读https://cloud.google.com/ml-engine/docs/how-tos/troubleshooting，但通常无法通过谷歌或stackexhange 找到我的具体错误报告。

我是菜鸟，所以我可能在某些基本方面犯了错误，但我无法发现它。

感谢所有帮助，

:-)

yarc68000。

-------环境---------

(env1) $ gcloud --version
Google Cloud SDK 170.0.0
alpha 2017.03.24
beta 2017.03.24
bq 2.0.25
core 2017.09.01
datalab 20170818
gcloud 
gsutil 4.27

(env1) $ python --version
Python 2.7.13 :: Anaconda 4.3.1 (64-bit)

(env1) $ conda list | grep tensorflow
tensorflow                1.3.0                     <pip>
tensorflow-tensorboard    0.1.6                     <pip>

------------执行和错误：boston_predict.csv ----------

$ gcloud ml-engine local predict --model-dir=<..>/export/Servo/1504780684 --text-instances 1709boston/boston_predict.csv
<..>
ERROR:root:Exception during running the graph: Could not parse example input, value: 'CRIM,ZN,INDUS,NOX,RM,AGE,DIS,TAX,PTRATIO'
[[Node: ParseExample/ParseExample = ParseExample[Ndense=9, Nsparse=0, Tdense=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], dense_shapes=[[1], [1], [1], [1], [1], [1], [1], [1], [1]], sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_Placeholder_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/dense_keys_0, ParseExample/ParseExample/dense_keys_1, ParseExample/ParseExample/dense_keys_2, ParseExample/ParseExample/dense_keys_3, ParseExample/ParseExample/dense_keys_4, ParseExample/ParseExample/dense_keys_5, ParseExample/ParseExample/dense_keys_6, ParseExample/ParseExample/dense_keys_7, ParseExample/ParseExample/dense_keys_8, ParseExample/Const, ParseExample/Const_1, ParseExample/Const_2, ParseExample/Const_3, ParseExample/Const_4, ParseExample/Const_5, ParseExample/Const_6, ParseExample/Const_7, ParseExample/Const_8)]]
<..>

------- 执行和错误无标题 boston_predict.csv ------

（这里我尝试使用 boston_predict.csv 并省略第一行）

$ gcloud ml-engine local predict --model-dir=<..>/export/Servo/1504780684 --text-instances 1709boston/boston_predict_headerless.csv
<..>
ERROR:root:Exception during running the graph: Could not parse example input, value: '0.03359,75.0,2.95,0.428,7.024,15.8,5.4011,252,18.3'
[[Node: ParseExample/ParseExample = ParseExample[Ndense=9, Nsparse=0, Tdense=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], dense_shapes=[[1], [1], [1], [1], [1], [1], [1], [1], [1]], sparse_types=[], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_Placeholder_0_0, ParseExample/ParseExample/names, ParseExample/ParseExample/dense_keys_0, ParseExample/ParseExample/dense_keys_1, ParseExample/ParseExample/dense_keys_2, ParseExample/ParseExample/dense_keys_3, ParseExample/ParseExample/dense_keys_4, ParseExample/ParseExample/dense_keys_5, ParseExample/ParseExample/dense_keys_6, ParseExample/ParseExample/dense_keys_7, ParseExample/ParseExample/dense_keys_8, ParseExample/Const, ParseExample/Const_1, ParseExample/Const_2, ParseExample/Const_3, ParseExample/Const_4, ParseExample/Const_5, ParseExample/Const_6, ParseExample/Const_7, ParseExample/Const_8)]]
<..>

【问题讨论】：

您介意分享您的代码吗？
后续问题：你打算使用模型进行在线预测吗？如果是这样，我建议使用 JSON 作为输入。

标签： tensorflow gcloud google-cloud-ml-engine

【解决方案1】：

可能有两个问题。

首先，您要导出的图形看起来好像需要 tf.Example protos 作为输入，即其中有一个 parse_example(...) 操作。波士顿样本似乎没有添加该操作，所以我怀疑这是您修改的一部分。

在展示你想要的 input_fn 代码之前，我们需要谈谈第二个问题：版本控制。估计器存在于 TensorFlow 的早期版本中，位于 tensorflow.contrib 下。然而，随着 TensorFlow 的后续版本，各个部分都迁移到了 tensorflow.estimator 中，并且随着它们的迁移，API 也发生了变化。

CloudML Engine 目前（截至 2017 年 9 月 7 日）仅支持 TF 1.0 和 1.2，因此我将提供适用于 1.2 的解决方案。这是基于census sample。这是使用 CSV 数据所需的 input_fn，尽管我通常建议导出独立于输入格式的模型：

# Provides the data types for the various columns.
FEATURE_DEFAULTS=[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0], [0.0]]

def predict_input_fn(rows_string_tensor):
  # Takes a rank-1 tensor and converts it into rank-2 tensor
  # Example if the data is ['csv,line,1', 'csv,line,2', ..] to
  # [['csv,line,1'], ['csv,line,2']] which after parsing will result in a
  # tuple of tensors: [['csv'], ['csv']], [['line'], ['line']], [[1], [2]]
  row_columns = tf.expand_dims(rows_string_tensor, -1)
  columns = tf.decode_csv(row_columns, record_defaults=FEATURE_DEFAULTS)
  features = dict(zip(FEATURES, columns))

  return tf.contrib.learn.InputFnOps(features, None, {'csv_row': csv_row})

您需要这样的导出策略：

saved_model_export_utils.make_export_strategy(
    predict_input_fn,
    exports_to_keep=1,
    default_output_alternative_key=None,
)

您将作为大小为 1 的列表传递给 tf.contrib.learn.Experiment 的构造函数。

【讨论】：

太棒了。那行得通。我仍在消化为什么。对于训练，我使用了一个 tf.estimator.inputs.pandas_input_fn()（来自波士顿样本），它适用于输入。更抽象地说，我遇到的问题是这个调用创建了与这个特定的 pandas 数据帧相关的占位符，并且为了预测，我需要一个独立于训练输入的不同输入占位符？如果这完全有意义..如果你有一个网址可以更好地教我这部分，我很感激。
不幸的是，我不知道获取该信息的好地方。我已经做了一个说明，以确保文档中涵盖了这些信息，尽管这可能需要一些时间才能发生。总而言之，TF 的架构要求您构建两个完全独立的图来进行训练和预测。 tf.estimator 隐藏了大部分细节（如果您编写自定义估计器，您会注意到在构建图形时需要处理 mode 参数），除了一件事：input_fn。在训练期间，输入通常来自文件，通过队列等。在预测期间，它只是馈送。