【问题标题】:Neural Nets for Image + numeric Data图像+数字数据的神经网络
【发布时间】:2018-08-15 09:34:17
【问题描述】:

我有一种情况,输入是一个图像和一组 (3) 个数字字段,输出是一个图像掩码。我不确定如何在 KERAS 中做到这一点......

我的架构有点像附件。我知道 CNN 和 Dense 架构,只是不确定如何在相应的网络中传递输入并进行 concat 操作。此外,对此提出更错误的架构的建议会很棒!!!!!!

请建议我,最好是示例代码。 提前致谢,Utpal。

【问题讨论】:

  • 在将全连接层应用于下分支的输入后,任何空间信息都会丢失(因为点之间的距离对密集层没有影响)。因此,在我看来,应用 then conv/deconv(对输入的空间结构敏感)没有多大意义。您能描述一下您要解决的问题吗?
  • 假设你得到了人脸以及年龄和性别,你需要定位人脸是否有任何出生标记(不仅仅是任何类型的标记)......输出将是一个二维图像其中出生标记用白色(1 或 255)表示,图像的其余部分为黑色(0)...
  • 如果只创建yes/no (0/255) 图像掩码就足够了,那么这是一个图像分割问题。您可以从以下 repo 为起点尝试某些模型的 keras 实现 - github.com/divamgupta/image-segmentation-keras
  • 谢谢你,但在这里,数字数据很重要,只是 CNN (VGG) 可能还不够......
  • 只是想知道在哪里放置数字数据(除了图像)

标签: keras conv-neural-network convolution


【解决方案1】:

我可以建议尝试 U-net 模型来解决这个问题。通常的 U-net 表示几个 conv 和 maxpooling 层,然后是几个 conv 和 upsampling 层:

在当前问题中你可以在中间混合非空间数据(图像标注):

也许从预训练的 VGG-16 开始也是一个好主意(见下文vgg.load_weights(VGG_Weights_path))。

参见下面的代码(基于Divam Gupta's repo):

from keras.models import *
from keras.layers import *


def VGGUnet(n_classes, input_height=416, input_width=608, data_length=128, vgg_level=3):
    assert input_height % 32 == 0
    assert input_width % 32 == 0

    # https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_th_dim_ordering_th_kernels.h5
    img_input = Input(shape=(3, input_height, input_width))
    data_input = Input(shape=(data_length,))

    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1', data_format=IMAGE_ORDERING)(img_input)
    x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2', data_format=IMAGE_ORDERING)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool', data_format=IMAGE_ORDERING)(x)
    f1 = x
    # Block 2
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2', data_format=IMAGE_ORDERING)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool', data_format=IMAGE_ORDERING)(x)
    f2 = x

    # Block 3
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3', data_format=IMAGE_ORDERING)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool', data_format=IMAGE_ORDERING)(x)
    f3 = x

    # Block 4
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3', data_format=IMAGE_ORDERING)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool', data_format=IMAGE_ORDERING)(x)
    f4 = x

    # Block 5
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2', data_format=IMAGE_ORDERING)(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3', data_format=IMAGE_ORDERING)(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool', data_format=IMAGE_ORDERING)(x)
    f5 = x

    x = Flatten(name='flatten')(x)
    x = Dense(4096, activation='relu', name='fc1')(x)
    x = Dense(4096, activation='relu', name='fc2')(x)
    x = Dense(1000, activation='softmax', name='predictions')(x)

    vgg = Model(img_input, x)
    vgg.load_weights(VGG_Weights_path)

    levels = [f1, f2, f3, f4, f5]

    # Several dense layers for image annotation processing
    data_layer = Dense(1024, activation='relu', name='data1')(data_input)
    data_layer = Dense(input_height * input_width / 256, activation='relu', name='data2')(data_layer)
    data_layer = Reshape((1, input_height / 16, input_width / 16))(data_layer)

    # Mix image annotations here
    o = (concatenate([f4, data_layer], axis=1))

    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(512, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (concatenate([o, f3], axis=1))
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(256, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (concatenate([o, f2], axis=1))
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(128, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = (UpSampling2D((2, 2), data_format=IMAGE_ORDERING))(o)
    o = (concatenate([o, f1], axis=1))
    o = (ZeroPadding2D((1, 1), data_format=IMAGE_ORDERING))(o)
    o = (Conv2D(64, (3, 3), padding='valid', data_format=IMAGE_ORDERING))(o)
    o = (BatchNormalization())(o)

    o = Conv2D(n_classes, (3, 3), padding='same', data_format=IMAGE_ORDERING)(o)
    o_shape = Model(img_input, o).output_shape
    output_height = o_shape[2]
    output_width = o_shape[3]

    o = (Reshape((n_classes, output_height * output_width)))(o)
    o = (Permute((2, 1)))(o)
    o = (Activation('softmax'))(o)
    model = Model([img_input, data_input], o)
    model.outputWidth = output_width
    model.outputHeight = output_height

    return model

要训练和评估具有多个输入的 keras 模型,请为每个输入层准备单独的数组 - image_trainannotation_train(保留第一个轴的顺序,即样本数)并调用:

model.fit([image_train, annotation_train], result_segmentation_train, batch_size=..., epochs=...)

test_loss, test_acc = model.evaluate([image_test, annotation_test], result_segmentation_test)

祝你好运!

【讨论】:

    猜你喜欢
    • 2013-06-30
    • 2012-05-15
    • 1970-01-01
    • 1970-01-01
    • 2016-11-29
    • 1970-01-01
    • 2017-08-20
    • 1970-01-01
    相关资源
    最近更新 更多