【问题标题】:How to pass several images through one SeparableConv2D?如何通过一个 SeparableConv2D 传递多张图像?
【发布时间】:2020-12-28 04:37:31
【问题描述】:

我有几张图片作为输入。我知道我可以使用 3D 卷积层,但我不想这样做。相反,我想在 2 维图像中找到模式。

我的意思是每张图片都应该通过SeparableConv2D,像这样:

# this code raises an error ValueError: Input 0 of layer <name> is 
# incompatible with the layer: expected ndim=4, found ndim=5.
model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    SeparableConv2D(32, 3),
    GlobalAvgPool3D(),
])

我知道,我可以在这里使用Conv3D 作为Conv2D

model = Sequential([
    Input(shape=(16, 128, 128, 1)),
    Conv3D(32, [1, 3, 3]),
    GlobalAvgPool3D(),
])

但我确实需要SeparableConv2D

也许我可以通过自定义层或其他方式做到这一点?我什至无法想象一个解决方案。


附:每个输入包含多个图像。

【问题讨论】:

    标签: python tensorflow keras computer-vision conv-neural-network


    【解决方案1】:

    如果您有诸如视频帧之类的图像,并且它们以某种方式相互连接 你可以遍历 conv2d 并一个一个地传递它们

    查看my git repo 中的此示例以进行视频动作识别

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            num_classes = 1
            dr_rate= 0.2
            pretrained = True
            rnn_hidden_size = 30
            rnn_num_layers = 2
            #get a pretrained vgg19 model ( taking only the cnn layers and fine tun them)
            baseModel = models.vgg19(pretrained=pretrained).features  
            i = 0
            for child in baseModel.children():
                if i < 28:
                    for param in child.parameters():
                        param.requires_grad = False
                else:
                    for param in child.parameters():
                        param.requires_grad = True
                i +=1
    
            num_features = 12800
            self.baseModel = baseModel
            self.dropout= nn.Dropout(dr_rate)
            self.rnn = nn.LSTM(num_features, rnn_hidden_size, rnn_num_layers , batch_first=True)
            self.fc2 = nn.Linear(30, 256)
            self.fc3 = nn.Linear(256, num_classes)
        def forward(self, x):
            batch_size, time_steps, C, H, W = x.size()
            # reshape input  to be (batch_size * timesteps, input_size)
            x = x.contiguous().view(batch_size * time_steps, C, H, W)
            x = self.baseModel(x)
            x = x.view(x.size(0), -1)
            #make output as  ( samples, timesteps, output_size)
            x = x.contiguous().view(batch_size , time_steps , x.size(-1))
            x , (hn, cn) = self.rnn(x)
            x = F.relu(self.fc2(x[:, -1, :])) # get output of the last  lstm not full sequence
            x = self.dropout(x)
            x = self.fc3(x)
            return x 
    

    主要思想是在这个块中,我们将每个帧或图像分配到卷积网络,然后我们重塑甚至将其馈送到新网络

    # reshape input  to be (batch_size * timesteps, input_size)
     x = x.contiguous().view(batch_size * time_steps, C, H, W)
     # feed to the pre-trained conv model
     x = self.baseModel(x)
     # flatten the output
     x = x.view(x.size(0), -1)
     # make the new correct shape (batch_size , timesteps , output_size)
     x = x.contiguous().view(batch_size , time_steps , x.size(-1))  # this x is now ready to be entred or feed into lstm layer
    

    【讨论】:

      【解决方案2】:

      只是为了确保您的输入形状应该是 4D。因为 SeparableConv2D 期望输入形状为 4D 张量的形状:(batch_size, channels, rows, cols) if data_format='channels_first' 或 4D 张量,形状为:(batch_size, rows, cols, channels) if data_format='channels_last '。

      工作示例

      import tensorflow as tf
      input_shape = (16, 128, 128, 1)
      x = tf.random.normal(input_shape)
      y = tf.keras.layers.SeparableConv2D( 2, 3, activation='relu', input_shape=input_shape[1:])(x)
      print(y.shape)
      

      输出

      (16, 126, 126, 2)
      

      【讨论】:

        猜你喜欢
        • 2014-06-27
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-01-24
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多