训练 3dconv 神经网络失败；损失收敛于 .6931答案

【问题标题】：Training 3dconv neural network fails; loss converges at .6931训练 3dconv 神经网络失败；损失收敛于 .6931
【发布时间】：2020-08-29 01:46:36
【问题描述】：

我编写了一个脚本来训练神经网络使用 .nii 文件作为输入，使用来自 TensorFlow 的教程https://www.tensorflow.org/tutorials/load_data/images。我稍微改变了它以使用 NiBabel 和 .nii 文件，但它仍然遵循相同的基本结构。但是，我遇到了一个问题，我的损失收敛到 0.6931，我认为这是因为模型开始猜测相同的东西，而不管输入、图像形状或批量大小。因此，我认为该模型没有学习。任何人都可以识别我的代码的任何致命缺陷吗？我已经累了：

更改 LR 的回调
更改数据、清理和重组数据
改变每类数量的比例
使用不同的优化器和损失函数
使用简单的密集、密集、密集模型，但这似乎不起作用，因为它甚至不想开始训练
使用重复数据集和固定大小（虽然我不清楚这有什么区别）

# Gets the label of the image, the label determines how tensorflow will classify the image
def get_label(file_path):
    # Convert the path to a list of path components
    parts = tf.strings.split(file_path, os.path.sep)
    # The fourth last is the class-directory
    return float(parts[-4] == "class1")


# Reads the data from a .nii file and returns a NumPy ndarray that is compatible with tensorflow
def decode_img(img):
    img = nib.load(img.numpy().decode('utf-8'))
    # convert the compressed string to a NumPy ndarray
    data = img.get_fdata()
    # Resize img
    data = np.resize(data, imgshape)
    # Normalize
    max = np.amax(data)
    min = np.amin(data)
    data = ((data-min)/(max-min))
    return data


# Processes a path to return a image data and label pair
def process_path(file_path):
    # Gets the files label
    label = get_label(file_path)
    img = decode_img(file_path)
    return img, label

我正在使用这些函数来处理我的数据并将其映射到我的列表文件数据集以处理我的数据。

def configure_for_performance(ds):
    #ds = ds.cache(filename='cachefile')
    ds = ds.cache()
    ds = ds.shuffle(buffer_size=1000)
    ds = ds.repeat()
    ds = ds.batch(BATCH_SIZE)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

我直接从 TensorFlow 教程中提取了这个。

# Create a sequential network
model = tf.keras.Sequential([
    tf.keras.layers.Convolution3D(
        4, 4, padding='same', data_format="channels_last", input_shape=imgshape, activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(2048, activation='tanh'),
    tf.keras.layers.Dense(1024, activation='tanh'),
    tf.keras.layers.Dense(512, activation='tanh'),
    tf.keras.layers.Dense(256, activation='tanh'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=500,
    steps_per_epoch=BATCH_SIZE,
    validation_steps=BATCH_SIZE
)

这是我的模型，我使用 3dconv 类似于在传统图像分类中使用 2dconv。

任何建议将不胜感激！

【问题讨论】：

可以添加运行model.fit()后生成的训练日志吗？

标签： python tensorflow machine-learning neural-network nibabel

【解决方案1】：

您获取图像的代码看起来不错，但是我无法自己测试它，因为我不确定您的数据是如何存储的。此外，您的模型将开始训练这一事实表明错误可能不在这里。如果您想确保可以使用 matplotlib 显示图像以确保它们正确加载。

我将首先让您的模型尽可能简单并且仍然可以工作，测试它是否仍然收敛到 0.6931 或其他数字。然后尝试使用不同的激活函数，即 relu。另一种方法可能是使用一些批量标准化。我的理论是你的 tanh 函数中有非常大或小的值，这会导致输出每次都接近 0 或 1。这也阻止了进一步的训练，因为训练的梯度非常小。更改为 relu 可能会针对较大的值解决此问题，但可能不是较小的值。使用批量标准化将使您的值远离 tanh 输出仅为 0 或 1 的极端。

【讨论】：

【解决方案2】：

如果您始终收敛到完全相同的损失，那么根据我的经验，只有一种解释 - 您对数据加载器的编码不正确。发生的事情是图像和标签不匹配。它试图学习纯粹的随机性。在这种情况下，它只会尽其所能输出“平均”正确答案。我怀疑 0.69 值来自您的数据标签，例如您有 69% 的 1 类和 31% 的 0 类。

【讨论】：