Caffe：一次预测多个标签（在 Python 中）答案

【问题标题】：Caffe: predict multiple labels at once (in Python)Caffe：一次预测多个标签（在 Python 中）
【发布时间】：2016-04-04 00:25:20
【问题描述】：

在 caffe 中，我希望能够一次预测多个标签，例如键盘箭头键：可以同时按下两个键。我正在尝试使用卷积神经网络在TM Nation Forever 比赛中驾驶虚拟 F1 赛车，我计划很快收集和塑造训练数据，我想知道我是否做对了。

我认为这篇文章将为如何在 Python 中进行此类分类提供一个很好的示例，但我还没有找到任何令人满意的示例来说明如何做到这一点。

是否有人可以验证这种收集和表示神经网络中数据的方式是否可以按我的预期工作：

HDF5 python 代码

comp_kwargs = {'compression': 'gzip', 'compression_opts': 1}

with h5py.File(train_filename, 'w') as f:
    f.create_dataset('data_img', data=X, **comp_kwargs)
    f.create_dataset('data_speed', data=S.astype(np.float_), **comp_kwargs)

    f.create_dataset('label_forward', data=f.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_backward', data=b.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_left', data=l.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_right', data=r.astype(np.int_), **comp_kwargs)

with open(train_filename_list_txt, 'w') as f:
    f.write(train_filename + '\n')

有关 HDF5 数据形状的信息

输入：

data_img: 
-> number N x channel K x height H x width W

data_speed:
-> number N  x  1 float number (from 0.0 to 1.0)

输出：

注意：我使用 numpy 的“int_”来获取要分类的标签类。

label_forward:
-> number N  x  1 integer number (0 or 1)

label_backward:
-> number N  x  1 integer number (0 or 1)

label_left:
-> number N  x  1 integer number (0 or 1)

label_right:
-> number N  x  1 integer number (0 or 1)

卷积神经网络架构

我在这里放了一些半相关的 cmets，如果它可以提高性能，我也将不胜感激对网络架构的任何意见:)

import numpy as np

import caffe
from caffe import layers as L
from caffe import params as P

def cnn(hdf5, batch_size):
    n = caffe.NetSpec()
    n.data_img, n.data_speed, n.label_forward, n.label_backward, n.label_left, label_right = (
        L.HDF5Data(batch_size=batch_size, source=hdf5, ntop=6)
    )

    n.conv1 = L.Convolution(n.data, kernel_size=7, num_output=32, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop1 = L.Dropout(n.pool1, in_place=True)
    n.relu1 = L.ReLU(n.drop1, in_place=True)

    n.conv2 = L.Convolution(n.relu1, kernel_size=5, num_output=42, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop2 = L.Dropout(n.pool2, in_place=True)
    n.relu2 = L.ReLU(n.drop2, in_place=True)

    n.conv3 = L.Convolution(n.relu2, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool3 = L.Pooling(n.conv3, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop3 = L.Dropout(n.pool3, in_place=True)
    n.relu3 = L.ReLU(n.drop3, in_place=True)

    n.conv4 = L.Convolution(n.relu3, kernel_size=3, num_output=64, weight_filler=dict(type='xavier'))
    n.pool4 = L.Pooling(n.conv4, kernel_size=3, stride=2, pool=P.Pooling.AVE)
    # Data of shape `batch_size*64*3*3` out of this layer (if dropout ignored), 
    # for a total of `batch_size*576` neurons.
    # Would you recommend to downsize this `3*3` feature map to `2*2`
    # or even `1*1` and to remove dropout at this level?
    n.drop4 = L.Dropout(n.pool4, in_place=True)
    n.relu4 = L.ReLU(n.drop4, in_place=True)

    n.join_speed = L.Concat(n.relu4, n.data_speed, in_place=True)
    # Note that I might be wrong on how the parameters are passed to the concat layer 
    n.ip1 = L.InnerProduct(n.join_speed, num_output=512, weight_filler=dict(type='xavier'))
    n.sig1 = L.Sigmoid(n.ip1, in_place=True)

    n.ip_f = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_f = L.Accuracy(n.ip_f, n.label_forward)
    n.loss_f = L.SoftmaxWithLoss(n.ip_f, n.label_forward)

    n.ip_b = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_b = L.Accuracy(n.ip_b, n.label_backward)
    n.loss_b = L.SoftmaxWithLoss(n.ip_b, n.label_backward)

    n.ip_l = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_l = L.Accuracy(n.ip_l, n.label_left)
    n.loss_l = L.SoftmaxWithLoss(n.ip_l, n.label_left)

    n.ip_r = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_r = L.Accuracy(n.ip_r, n.label_right)
    n.loss_r = L.SoftmaxWithLoss(n.ip_r, n.label_right)

    return n.to_proto()

with open('cnn_train.prototxt', 'w') as f:
    f.write(str(
        cnn(train_filename_list_txt, 100)
    ))

此外，我希望一次只按下左箭头键或右箭头键中的一个。考虑到我将使用一些SoftmaxWithLossLayer：

label_right:
-> number N  x  1 integer number (0 for left or 1 for right)

【问题讨论】：

标签： python neural-network classification hdf5 caffe

【解决方案1】：

最后，我为任务做的事情是正确的，除了 concat 层可能因为连接层的形状不同而无法工作。我使用 cifar-100 数据集对此进行了测试，其中既有粗标签又有细标签，效果很好。

【讨论】：