将 tensorflow 模型转换为 tflite 输出一个我无法去量化的 int8答案

【问题标题】：Convertted tensorflow model to tflite outputs a int8 which I cannot dequantize将 tensorflow 模型转换为 tflite 输出一个我无法去量化的 int8
【发布时间】：2021-02-12 18:19:09
【问题描述】：

我目前已经使用以下类脚本量化了一个张量流模型：

class QuantModel():

def __init__(self, model=tf.keras.Model,data=[]):
    '''
    1. Accepts a keras model, long term will allow saved model and other formats
    2. Accepts a numpy or tensor data of the format such that indexing such as
    data[0] will return one input in the correct format to be fed forward through the
    network
    '''
    self.data=data
    self.model=model


'''Added script to quantize model and allows custom ops
for Logmelspectrogram operations (Might cause mix quantization)'''
def quant_model_int8(self):
    converter = tf.lite.TFLiteConverter.from_keras_model(self.model)
    converter.representative_dataset=self.representative_data_gen
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    #converter.allow_custom_ops=True
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    tflite_model_quant = converter.convert()
    open("converted_model2.tflite",'wb').write(tflite_model_quant)
    return tflite_model_quant




'''Returns a tflite model with no quantization i.e. weights and variable data all
in float32'''
def convert_tflite_no_quant(self):
    converter = tf.lite.TFLiteConverter.from_keras_model(self.model)
    tflite_model = converter.convert()
    open("converted_model.tflite",'wb').write(tflite_model)
    return tflite_model


def representative_data_gen(self):
    # Model has only one input so each data point has one element.
    yield [self.data]

我能够成功地量化我的模型，但是输入和输出是 int8，因为这些是您量化后的选项。

现在要运行模型，我使用 tf.quantization.quantize 将输入数据更改为 qint 数据格式并通过我的网络提供。所以正如预期的那样，我得到了一个 int8 的输出。

我想将输出转换回 float32 并检查它。为此，我正在使用 tf.dequantize。但是，这只适用于 tf.qint8 数据类型。

想知道如何处理这个问题，是否有人遇到过类似的问题？

# Load the TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="converted_model2.tflite")
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
data_arr= np.load('Data_Mel.npy')
print(data_arr.shape)
sample=data_arr[0]
print(sample.shape)
minn=min(sample.flatten())
maxx=max(sample.flatten())
print(minn,maxx)

(sample,sample_1,sample_2)=tf.quantization.quantize(data_arr[0],minn,maxx,tf.qint8)
print(sample.shape)
    
# Test the model on random input data.
input_shape = input_details[0]['shape']
input_data = sample
interpreter.set_tensor(input_details[0]['index'], input_data)


interpreter.invoke()

# The function `get_tensor()` returns a copy of the tensor data.
# Use `tensor()` in order to get a pointer to the tensor.
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data.dtype)
output_data=tf.quantization.dequantize(output_data,minn,maxx)
print(output_data)

【问题讨论】：

标签： tensorflow2.0 tensorflow-lite

【解决方案1】：

我认为您可以简单地删除 converter.inference_input_type = tf.int8 和 converter.inference_output_type = tf.int8 标志并将输出模型视为浮点模型。这里有一些细节：

转换器中的“优化”标志将浮点模型量化为 int8。默认情况下，它在量化模型的开头添加一个 [Quant] op，并在末尾添加一个 [Dequant]：

(float) ->[Quant] -> (int8) -> [op1] -> (int8) -> [op...] -> (int8) -> [Dequant] -> (float)

因此，您无需更改任何驱动程序逻辑，因为整个模型仍然具有浮动接口，而 [op] 被量化。

额外的标志 converter.inference_input_type = tf.int8 和 converter.inference_output_type = tf.int8 允许您删除 [Quant] 和 [Dequant] 操作，因此量化模型如下所示：

(int8) -> [op1] -> (int8) -> [op...] -> (int8)

这适用于在某些硬件/工作流程上进行部署。由于您是手动添加 [Quant] 和 [Dequant]，因此具有浮点接口的量化模型可能更适合您的情况。

【讨论】：

嗨。谢谢（你的）信息。我知道这一点，量化的目的是部署在硬件上。但在此之前，我想评估模型在测试数据上的性能，为此我正在执行上述操作，即将模型预测为 int8，这就是我将如何部署它的方式。
我明白了。如果你真的想测试int8模型，可以用简单的python代码来做float和int8的转换，而不是使用重量级的TF op。