获得完全量化的 TfLite 模型，以及 int8 上的输入和输出答案

【问题标题】：Get fully qunatized TfLite model, also with in- and output on int8获得完全量化的 TfLite 模型，以及 int8 上的输入和输出
【发布时间】：2020-12-27 14:39:40
【问题描述】：

我使用 Tensorflow 1.15.3 量化 Keras h5 模型（TF 1.13 ; keras_vggface 模型），以将其与 NPU 一起使用。我用来转换的代码是：

converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname)  
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

我得到的量化模型乍一看还不错。层的输入类型为int8，filter为int8，bias为int32，输出为int8。

但是，模型在输入层之后有一个量化层，输入层是 float32 [见下图]。但似乎 NPU 也需要输入为 int8。

有没有一种方法可以在不使用转换层但也使用 int8 作为输入的情况下进行完全量化？

正如你在上面看到的，我使用了：

 converter.inference_input_type = tf.int8
 converter.inference_output_type = tf.int8

编辑

来自用户 dtlam 的解决方案

尽管模型仍然无法使用 google NNAPI 运行，但使用 TF 1.15.3 或 TF2.2.0 使用 int8 量化模型并在 int8 中输出的解决方案是，感谢 delan：

...
converter = tf.lite.TFLiteConverter.from_keras_model_file(saved_model_dir + modelname) 
        
def representative_dataset_gen():
  for _ in range(10):
    pfad='pathtoimage/000001.jpg'
    img=cv2.imread(pfad)
    img = np.expand_dims(img,0).astype(np.float32) 
    # Get sample input data as a numpy array in a method of your choosing.
    yield [img]
    
converter.representative_dataset = representative_dataset_gen

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.experimental_new_converter = True

converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8 
quantized_tflite_model = converter.convert()
if tf.__version__.startswith('1.'):
    open("test153.tflite", "wb").write(quantized_tflite_model)
if tf.__version__.startswith('2.'):
    with open("test220.tflite", 'wb') as f:
        f.write(quantized_tflite_model)

【问题讨论】：

标签： tensorflow tensorflow-lite quantization

【解决方案1】：

如果您应用了训练后量化，您必须确保您的代表性数据集不在 float32 中。此外，如果您想确定使用 int8 或 uint8 输入/输出量化模型，您应该考虑使用量化感知训练。这也给你更好的量化结果

我还尝试从你给我的图像和代码中量化你的模型，毕竟它是量化的

【讨论】：

Ahhhhhhhhhhhhhhhh，很高兴知道。我会尽快试试这个。将使用训练感知量化作为最后的手段 :-) 该模型的美妙之处在于它经过很好的预训练。你能在微调期间也使用训练感知量化吗？因此，只需使用量化感知训练在较小的 int8 集上重新训练模型？非常感谢，祝您健康快乐。
是的，QAT 帮助模型在低位维度上更好地运行；))。正如我从您的模型中看到的，如果您使用 tf2.0 中的 QAT，则量化模型不能有输入 uint8 或 int8
在此，他们已经明确声明您应该使用 tf1.x coral.ai/docs/edgetpu/faq
非常感谢。很高兴知道 TF2 只支持 float32 输入。我不知道这一点（我没有使用谷歌的 TPU，所以我错过了这个文档，干杯）。
很高兴听到。我很可能会回到那个提议；-) 很快就会和你谈谈。干杯简