如何通过一个 SeparableConv2D 传递多张图像？答案

【问题标题】：How to pass several images through one SeparableConv2D?如何通过一个 SeparableConv2D 传递多张图像？
【发布时间】：2020-12-28 04:37:31
【问题描述】：

我有几张图片作为输入。我知道我可以使用 3D 卷积层，但我不想这样做。相反，我想在 2 维图像中找到模式。

我的意思是每张图片都应该通过SeparableConv2D，像这样：

# this code raises an error ValueError: Input 0 of layer <name> is 
# incompatible with the layer: expected ndim=4, found ndim=5.
model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    SeparableConv2D(32, 3),
    GlobalAvgPool3D(),
])

我知道，我可以在这里使用Conv3D 作为Conv2D：

model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    Conv3D(32, [1, 3, 3]),
    GlobalAvgPool3D(),
])

但我确实需要SeparableConv2D。

也许我可以通过自定义层或其他方式做到这一点？我什至无法想象一个解决方案。

附：每个输入包含多个图像。

【问题讨论】：

标签： python tensorflow keras computer-vision conv-neural-network

【解决方案1】：

如果您有诸如视频帧之类的图像，并且它们以某种方式相互连接你可以遍历 conv2d 并一个一个地传递它们

查看my git repo 中的此示例以进行视频动作识别

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        num_classes = 1
        dr_rate= 0.2
        pretrained = True
        rnn_hidden_size = 30
        rnn_num_layers = 2
        #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
        baseModel = models.vgg19(pretrained=pretrained).features  
        i = 0
        for child in baseModel.children():
            if i < 28:
                for param in child.parameters():
                    param.requires_grad = False
            else:
                for param in child.parameters():
                    param.requires_grad = True
            i +=1

        num_features = 12800
        self.baseModel = baseModel
        self.dropout= nn.Dropout(dr_rate)
        self.rnn = nn.LSTM(num_features, rnn_hidden_size, rnn_num_layers , batch_first=True)
        self.fc2 = nn.Linear(30, 256)
        self.fc3 = nn.Linear(256, num_classes)
    def forward(self, x):
        batch_size, time_steps, C, H, W = x.size()
        # reshape input  to be (batch_size * timesteps, input_size)
        x = x.contiguous().view(batch_size * time_steps, C, H, W)
        x = self.baseModel(x)
        x = x.view(x.size(0), -1)
        #make output as  ( samples, timesteps, output_size)
        x = x.contiguous().view(batch_size , time_steps , x.size(-1))
        x , (hn, cn) = self.rnn(x)
        x = F.relu(self.fc2(x[:, -1, :])) # get output of the last  lstm not full sequence
        x = self.dropout(x)
        x = self.fc3(x)
        return x

主要思想是在这个块中，我们将每个帧或图像分配到卷积网络，然后我们重塑甚至将其馈送到新网络

# reshape input  to be (batch_size * timesteps, input_size)
 x = x.contiguous().view(batch_size * time_steps, C, H, W)
 # feed to the pre-trained conv model
 x = self.baseModel(x)
 # flatten the output
 x = x.view(x.size(0), -1)
 # make the new correct shape (batch_size , timesteps , output_size)
 x = x.contiguous().view(batch_size , time_steps , x.size(-1))  # this x is now ready to be entred or feed into lstm layer

【讨论】：

【解决方案2】：

只是为了确保您的输入形状应该是 4D。因为 SeparableConv2D 期望输入形状为 4D 张量的形状：(batch_size, channels, rows, cols) if data_format='channels_first' 或 4D 张量，形状为：(batch_size, rows, cols, channels) if data_format='channels_last '。

工作示例

import tensorflow as tf
input_shape = (16, 128, 128, 1)
x = tf.random.normal(input_shape)
y = tf.keras.layers.SeparableConv2D( 2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)

输出

(16, 126, 126, 2)

【讨论】：