在 Python3 中使用 Keras 优化 CNN 的架构答案

【问题标题】：Optimizing the Architecture of a CNN Using Keras in Python3在 Python3 中使用 Keras 优化 CNN 的架构
【发布时间】：2022-03-12 19:10:35
【问题描述】：

我正在尝试将我的 CNN 验证准确率从 76%（当前）提高到 90% 以上。我将在下面展示有关我的 CNN 性能和配置的所有信息。

本质上，我希望我的 CNN 能够区分两类梅尔谱图：

第 1 类 第 2 类 这是准确率与时代的关系图：

这是损失与时代的关系图

最后是模型架构配置

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(3, 640, 480)))
model.add(Conv2D(64, (3, 3), activation='relu', dim_ordering="th"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

这是我对 model.compile() 和 model.fit() 的调用

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.SGD(lr=0.001),
          metrics=['accuracy'])
print("Compiled model")

history = model.fit(X_train, Y_train,
      batch_size=8,
      epochs=50,
      verbose=1,
      validation_data=(X_test, Y_test))

如何更改我的 CNN 配置以提高验证准确度得分？

我尝试过的事情：

降低学习率以防止准确性出现零星波动。
将 batch_size 从 64 减少到 8。
将 epoch 数增加到 50（但不确定这是否足够）。

任何帮助将不胜感激！

更新 #1 我将 epoch 数增加到 200，在让程序运行一夜之后，我得到了大约 76.31% 的验证准确度分数

我在下面发布了准确率与时代以及损失与时代的图片

我还可以对我的模型架构进行哪些具体更改以提高准确性？

【问题讨论】：

您使用多少样本进行训练和验证？有时最好为您的模型进行良好的初始化。如果你有足够的数据来训练更深层次的模型，你应该尝试微调这个模型：music-auto_tagging
我正在训练 993 张图片，并正在测试 243 张图片
@Eric 我将如何使用 music_tagging CNN，因为它不是在百万歌曲数据集上训练的（因此完全用于不同的目的）？
您只需更改输出层，然后将您不想训练的层设置为不可训练。如果你不知道怎么做，我会用一些代码回答你的问题；）
我有一些代码可能对你有帮助。 music genre recognition

标签： python-3.x tensorflow deep-learning keras spectrogram

【解决方案1】：

首先你要得到music_tagger_cnn.py 并把它放在项目路径中。之后，您可以构建模型：

from music_tagger_cnn import *
input_tensor = Input(shape=(1, 18, 119))
model =MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')

你可以通过你想要的维度改变输入张量... 我通常使用 Theano dim ordering，但使用 Tensorflow 作为后端，这就是为什么：

from keras import backend as K
K.set_image_dim_ordering('th')

使用 Theano dim 排序时，您必须考虑必须更改样本尺寸的顺序

X_train = X_train.transpose(0, 3, 2, 1)
X_val = X_val.transpose(0, 3, 2, 1)

之后你必须冻结这些你不想更新的层

for layer in model.layers: 
     layer.trainable = False

现在您可以设置自己的输出，例如：

last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)

之后你必须能够训练它只是做：

sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
                          validation_data=(X_val, labels_val), nb_epoch=100, batch_size=5)

请注意，标签应采用 one-hot 编码

希望对你有帮助！！

更新：发布代码以便我可以帮助调试这些行并防止崩溃。

input_tensor = Input(shape=(3, 640, 480))
model = MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')

for layer in model.layers: 
     layer.trainable = False


last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)

sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
                          validation_data=(X_test, Y_test), nb_epoch=100, batch_size=5)

编辑 #2

    # -*- coding: utf-8 -*-
'''MusicTaggerCNN model for Keras.

# Reference:

- [Automatic tagging using deep convolutional neural networks](https://arxiv.org/abs/1606.00298)
- [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)

'''
from __future__ import print_function
from __future__ import absolute_import

from keras import backend as K
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import ELU
from keras.utils.data_utils import get_file
from keras.layers import Input, Dense

TH_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_theano.h5'
TF_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_tensorflow.h5'


def MusicTaggerCNN(weights='msd', input_tensor=None,
                   include_top=True):
    '''Instantiate the MusicTaggerCNN architecture,
    optionally loading weights pre-trained
    on Million Song Dataset. Note that when using TensorFlow,
    for best performance you should set
    `image_dim_ordering="tf"` in your Keras config
    at ~/.keras/keras.json.

    The model and the weights are compatible with both
    TensorFlow and Theano. The dimension ordering
    convention used by the model is the one
    specified in your Keras config file.

    For preparing mel-spectrogram input, see
    `audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications).
    You will need to install [Librosa](http://librosa.github.io/librosa/)
    to use it.

    # Arguments
        weights: one of `None` (random initialization)
            or "msd" (pre-training on ImageNet).
        input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
            to use as image input for the model.
        include_top: whether to include the 1 fully-connected
            layer (output layer) at the top of the network.
            If False, the network outputs 256-dim features.


    # Returns
        A Keras model instance.
    '''
    if weights not in {'msd', None}:
        raise ValueError('The `weights` argument should be either '
                         '`None` (random initialization) or `msd` '
                         '(pre-training on Million Song Dataset).')

    # Determine proper input shape
    if K.image_dim_ordering() == 'th':
        input_shape = (3, 640, 480)
    else:
        input_shape = (3, 640, 480)

    if input_tensor is None:
        melgram_input = Input(shape=input_shape)
    else:
        if not K.is_keras_tensor(input_tensor):
            melgram_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            melgram_input = input_tensor

    # Determine input axis
    if K.image_dim_ordering() == 'th':
        channel_axis = 1
        freq_axis = 2
        time_axis = 3
    else:
        channel_axis = 3
        freq_axis = 1
        time_axis = 2

    # Input block
    x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(melgram_input)

    # Conv block 1
    x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool1')(x)

    # Conv block 2
    x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool2')(x)

    # Conv block 3
    x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x)
    x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
    x = ELU()(x)
    x = MaxPooling2D(pool_size=(2, 4), name='pool3')(x)



    # Output
    x = Flatten()(x)
    if include_top:
        x = Dense(50, activation='sigmoid', name='output')(x)

    # Create model
    model = Model(melgram_input, x)
    if weights is None:
        return model
    else:
        # Load input
        if K.image_dim_ordering() == 'tf':
            raise RuntimeError("Please set image_dim_ordering == 'th'."
                               "You can set it at ~/.keras/keras.json")
        model.load_weights('data/music_tagger_cnn_weights_%s.h5' % K._BACKEND,
                           by_name=True)
        return model

编辑#3

我尝试了使用 MusicTaggerCRNN 作为 melgram 特征提取器的 keras 示例。然后我训练了一个简单的 NN，它有 2 个 Dense 层和一个二进制输出。我的示例中采集的样本不适用于您的情况，但它也是一个二元分类器我使用了 keras==1.2.2 和 tensorflow-gpu==1.0.0 并为我工作。

代码如下：

from keras.applications.music_tagger_crnn import MusicTaggerCRNN
from keras.applications.music_tagger_crnn import preprocess_input, decode_predictions
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.optimizers import SGD


model = MusicTaggerCRNN(weights='msd', include_top=False)
#Samples simulation
audio_paths_train = ['data/genres/blues/blues.00000.au','data/genres/classical/classical.00000.au','data/genres/classical/classical.00002.au', 'data/genres/blues/blues.00003.au']
audio_paths_test = ['data/genres/blues/blues.00001.au', 'data/genres/classical/classical.00001.au', 'data/genres/blues/blues.00002.au', 'data/genres/classical/classical.00003.au']
labels_train = [0,1,1,0]
labels_test = [0, 1, 0, 1]
melgrams_train = [preprocess_input(audio_path) for audio_path in audio_paths_train]
melgrams_test = [preprocess_input(audio_path) for audio_path in audio_paths_test]
feats_train = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_train]
feats_test = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_test]
feats_train = np.array(feats_train)
feats_test = np.array(feats_test)

_input = Input(shape=(1,32))
x = Flatten(name='flatten')(_input)
x = Dense(128, activation='relu', name='fc6')(x)
x = Dense(64, activation='relu', name='fc7')(x)
x = Dense(1, activation='softmax', name='fc8')(x)
class_model = Model(_input, x)

sgd = SGD(lr=0.01, momentum=0, decay=0.02, nesterov=True)
class_model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = class_model.fit(feats_train, labels_train, validation_data=(feats_test, labels_test), nb_epoch=100, batch_size=5, class_weight='auto')
print(history.history['acc'])

# Final evaluation of the model
scores = class_model.evaluate(feats_test, labels_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))

【讨论】：

运行后出现此错误raise ValueError(err.message) ValueError: Negative dimension size caused by subtracting 2 from 1 for 'pool2/MaxPool' (op: 'MaxPool') with input shapes: [?,1,160,128].。
告诉我火车组的大小X_train。它应该在一个 numpy 数组中
打印 X_train.shape 产生的尺寸为 (993, 3, 640, 480)
这意味着有 993 张图片，每张都是 640 x 480 像素的 rgb 图像
好的，您应该将输入张量更改为您的尺寸(3, 640, 480)

【解决方案2】：

辍学：事实证明，简单的 Dropout 对 CNN 无效。尝试将第一个 Dropout 层更改为 SpatialDropout2D。第二个 Dropout 仅用于标准 Dense 层，因此这是正确的 Dropout。 CNN 网络创建了一系列重叠的金字塔。标准 Dropout 只是将其中一些金字塔变成了树桩。 SpatialDropout complete 将一些金字塔归零，这是一种更好的“遗忘”模式。
重复结构：使用 Conv2D->Conv2D->SpatialDropout 链的几个不同副本，使用更多过滤器制作缩小图像。所有成功的图像处理设计都使用多个重复块。
您的数据：频谱图有两种不同的测量值作为“图像”的不同维度。图像处理的正常设计是将所有相邻像素视为对一个输出特征的同等贡献。您可能希望一行中有许多像素对特征图有贡献，但只需要一两行。所以，也许每个金字塔都需要一个又长又窄的底座。

【讨论】：

【解决方案3】：

您可以对您的代码进行一些初步修改并检查结果：

增加层数
增加图层大小
改变学习率
更改优化器（通常 Adam 比 SGD 工作得更好，...）
打乱你的数据（也许你的测试数据包括一些远离训练样本的样本
尝试添加批量标准化层
有时增加 MFCC 特征数也会有所帮助（而不是 640、480 尝试提取更多特征

【讨论】：