将在瓦片上训练的卷积网络分类器应用于大图像答案

【问题标题】：Applying a convnet classifier trained on tiles to a large image将在瓦片上训练的卷积网络分类器应用于大图像
【发布时间】：2021-01-03 09:50:03
【问题描述】：

我的任务是在文档图片上找到某个字母。使用经典的计算机视觉，我将图像分割成字符。然后我使用了一个在 25×25 像素字符图像上训练的神经网络，将它们分类为我想要的和所有其他的。使用它我可以重建这些字符的位置。

现在我想将卷积网络直接应用于整个图像，这样我就不必依赖经典分割。该网络是一个深度神经网络，由 2D 卷积、2D 最大池化层和密集分类器组成。网络如下所示：

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_61 (Conv2D)           (None, 23, 23, 32)        320       
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 11, 11, 32)        0         
_________________________________________________________________
conv2d_62 (Conv2D)           (None, 9, 9, 64)          18496     
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_46 (Flatten)         (None, 1024)              0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_89 (Dense)             (None, 1)                 1025      
=================================================================
Total params: 19,841
Trainable params: 19,841
Non-trainable params: 0

我知道我可以使用经过训练的过滤器将卷积部分应用于整个图像。这将使我以具有更大空间维度的张量的形式对这些过滤器做出响应。但是为了进行分类，我需要使用针对固定数量的空间信息进行训练的分类器。提供不同尺寸的图像会破坏这一点。

到目前为止，我最好的想法是将图像切成小块并将每个固定大小的小块输入分类器。这似乎是another question 的答案。

是否存在更好的方法，将经过训练的过滤器应用于整个图像，并且可以使用经过训练的分类器进行某种局部分类？

【问题讨论】：

跟进：您是否尝试过建议的解决方案？它对您有用吗，还是您遇到了一些问题？非常感谢一些反馈，以便将来的读者可以改进/纠正答案（当然，这样做没有压力！）。谢谢！

标签： python tensorflow keras deep-learning computer-vision

【解决方案1】：

作为一种解决方案，我建议您使用tf.image.extract_patches 函数从图像中提取补丁并将您训练的分类器应用于每个补丁。这有几个好处：

您将获得一个密集的响应图，您可以对其进行进一步处理以准确确定字母在整个图像中出现的位置。
由于这是一个内置的 TensorFlow Op，您可以通过将所有这些作为单个 Keras 模型实现和运行来简化流程，从而利用批处理以及加速的 CPU/GPU 处理。

这是解决方案的草图：

import tensorflow as tf
from tensorflow.keras.layers import Input, Reshape, TimeDistributed

whole_images = Input(shape=(img_rows, img_cols, 1))
patches = tf.image.extract_patches(
    whole_images,
    sizes=[1, 25, 25, 1],
    strides=[1, 1, 1, 1], # you can choose to increase the stride if you don't want a dense classification map
    rates=[1, 1, 1, 1],
    padding='SAME'
)
# The `patches` would have a shape of `(batch_size, num_row_locs, num_col_locs, 25*25)`.
# So we reshape it so that we can apply the classifier to each patch independently.
reshaped_patches = Reshape((-1, 25, 25, 1))(patches)
dense_map = TimeDistributed(letter_classifier)(reshaped_patches)
# Reshape it back
dense_map = Reshape(tf.shape(patches)[1:-1])(dense_map)

# Construct the model
image_classifier = Model(whole_images, dense_map)

# Use it on the real images
output = image_classifier(my_images)

【讨论】：