在 2D 数据上演示 Keras 过拟合答案

【问题标题】：Demonstrating overfitting with Keras on 2D data在 2D 数据上演示 Keras 过拟合
【发布时间】：2018-03-13 05:56:41
【问题描述】：

我是一名计算机科学教师，目前正在开设一门深度学习入门课程。 Python 和 Keras 框架是我选择的工具。

我想通过在一些预定义的 2D 数据上训练一个越来越复杂的模型来向我的学生展示什么是过度拟合，就像 this example 的末尾那样。

同样的想法出现在 Andrew Ng 的course on neural networks tuning 的编程活动中。

但是，无论我多么努力，我都无法用 Keras 复制这种行为。使用相同的数据集和超参数，决策边界总是“更平滑”，模型永远不会拟合数据集中的噪声点。请参阅下面的结果和click here 以浏览相关代码。以下是相关摘录：

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='tanh', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=50)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

我做错了吗？ Keras 中是否有一些内部优化机制起作用？我可以通过其他编译选择来缓解它们吗？

【问题讨论】：

以后发问题的时候看看：Should 'Hi', 'thanks', taglines, and salutations be removed from posts?
请添加您的代码，该链接将来可能无法使用，因此您的问题将变得无用

标签： python machine-learning neural-network keras

【解决方案1】：

您还可以增加 epoch 的数量，并使用“relu”作为激活层，以获得锐利的边缘，例如 Andrew Ng。我在Colaboratory 下运行您的笔记本，使用 50 个神经元的 1 层网络，并为您的卫星添加噪声，以便获得单独的彩色区域。请看一下，别忘了激活 GPU（执行/修饰符 le type d'exécution）。

# Varying the hidden layer size to observe underfitting and overfitting
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [50]
for i, hidden_layer_size in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {:d}'.format(hidden_layer_size))

    model = Sequential()
    model.add(Dense(hidden_layer_size, activation='relu', input_shape=(2,)))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=1.0), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=5000)

    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

5000 epochs + relu (looks like what you want)

5000 epochs + tanh (tanh smoothes too much the curve for you)

【讨论】：

【解决方案2】：

你的问题是你所有的例子都是一个大小不同的单层神经网络！如果您打印权重，您会注意到，当您增加层的大小（例如从 5 到 50）后，其他神经元（例如 45 个神经元）的权重将接近于零，因此它们是相同的。

您已经增加了神经网络的深度以查看过度拟合。例如我改变了你的代码，前两个例子是单层神经网络，第三个（[30, 30, 30, 30]）是一个四层神经网络（完整的源代码是here）：

# Generate moon-shaped data with less samples and more noise
# data, targets = make_moons(500, noise=0.45)
from sklearn.datasets import make_moons, make_classification

data, targets =  make_classification(n_samples = 200, n_features=2, n_redundant=0, n_informative=2,
                           random_state=2, n_clusters_per_class=2)
plot_data(data, targets)
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [[2], [20], [30, 30, 30, 30]]

for i, hidden_layer_sizes in enumerate(hidden_layer_dimensions):
    fig = plt.subplot(4, 2, i+1)
    plt.title('Hidden Layer size: {}'.format(str(hidden_layer_sizes)))
    model = Sequential()
    for j, layer_size in enumerate(hidden_layer_sizes):
      if j == 0:
        model.add(Dense(layer_size, activation='tanh', input_shape=(2,)))
      else:
        model.add(Dense(layer_size, activation='tanh'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(SGD(lr=0.1), 'binary_crossentropy', metrics=['accuracy'])
    history = model.fit(data, targets, verbose=0, epochs=500)
    plot_decision_boundary(lambda x: model.predict(x) > 0.5, data, targets, fig)

结果如下：

您也可以使用Tensorflow Playground 实现您的目标。请检查一下！它有一个很好的交互式用户界面

【讨论】：

【解决方案3】：

我终于设法通过显着增加梯度下降和参数更新的数量来获得对我的数据的过度拟合。它适用于 tanh 和 ReLU 激活函数。

这是更新后的行：

history = model.fit(x_train, y_train, verbose=0, epochs=5000, batch_size=200)

完整的代码是here，结果如下。

【讨论】：