如何量化优化的 tflite 模型的输入和输出答案

【问题标题】：How to quantize inputs and outputs of optimized tflite model如何量化优化的 tflite 模型的输入和输出
【发布时间】：2019-07-02 15:59:59
【问题描述】：

我使用以下代码生成量化的tflite模型

import tensorflow as tf

def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()

但是根据post training quantization：

生成的模型将被完全量化，但为方便起见仍采用浮点输入和输出。

要为 Google Coral Edge TPU 编译 tflite 模型，我还需要量化输入和输出。

在模型中，我看到第一个网络层将浮点输入转换为input_uint8，最后一层将output_uint8 转换为浮点输出。如何编辑 tflite 模型以摆脱第一个和最后一个浮动层？

我知道我可以在转换期间将输入和输出类型设置为 uint8，但这与任何优化都不兼容。唯一可用的选择是使用虚假量化，这会导致模型错误。

【问题讨论】：

如果你想要一个完全量化的网络（uint8 输入），那么你必须以不同的方式使用 tflite 转换器。通过 dummy_quantisation，或者使用量化感知训练（包括范围）导出网络并使用它来导出。训练后量化使用 fp32 输入，或者去量化并使用 fp32 内核或动态量化（参考下面的 tf 页面）。 “为了进一步改善延迟，混合算子将激活动态量化为 8 位，并使用 8 位权重和激活执行计算”
其实你是对的。即使通过使用校准数据集并捕获输入范围，提取的 tflite 仍然具有 fp32 输入和输出以及训练后量化。只有通过量化感知训练和虚拟量化，您才能提取完全量化的网络（使用 u8 输入 - 输出）。
@KonstantinosMonachopoulos 你确定吗？看起来你可以在没有量化意识训练的情况下做完整的整数（包括输入/输出）。我认为它可以在纯训练后场景中完成，请参阅接受的答案here 和文档here

标签： python tensorflow-lite google-coral

【解决方案1】：

您可以通过将 inference_input_type 和 inference_output_type (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/lite.py#L460-L476) 设置为 int8 来避免浮点到 int8 和 int8 到浮点“quant/dequant”操作。

【讨论】：

非常感谢。我的问题的根源是我将 inference_type 设置为 uint8 而不是 inference_input_type。
在我的例子中，我的 Keras 模型只有一个uint8 的输入层，但是没有被量化（例如float32）。这是为了确保快速导入 RGB 文件，而无需在 CPU 上进行类型转换。但是，我收到错误 tensorflow/lite/toco/tooling_util.cc:2258] Check failed: array.data_type == array.final_data_type Array "input_1" has mis-matching actual and final data types (data_type=uint8, final_data_type=float)... 这似乎表明除非我的模型完全量化，否则它不会接受 uint8 作为输入 dtype。

【解决方案2】：

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir
converter.optimizations = [tf.lite.Optimize.DEFAULT] 
converter.representative_dataset = representative_dataset
#The below 3 lines performs the input - output quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()

【讨论】：

虽然此代码可能会解决问题，including an explanation 关于如何以及为什么解决问题将真正有助于提高您的帖子质量，并可能导致更多的赞成票。请记住，您正在为将来的读者回答问题，而不仅仅是现在提问的人。请edit您的回答添加解释并说明适用的限制和假设。