使用 StandardScalar 而不是 MinMax 时获取 nan 作为损失函数答案

【问题标题】：Getting nan for loss function when using StandardScalar instead of MinMax使用 StandardScalar 而不是 MinMax 时获取 nan 作为损失函数
【发布时间】：2020-03-12 17:51:32
【问题描述】：

我目前正在运行一个带有一个隐藏层的回归神经网络。当我使用 MinMax Scaler 时，我没有得到 nan 的损失函数，但我在 epoch 65 得到了一个峰值。这就是为什么我想使用 Standard Scaler 进行归一化。

对 2456 个样本进行训练，对 614 个样本进行验证：

最小最大：

Epoch 1/200
2456/2456 [==============================] - 1s 208us/sample - loss: 792.2849 - accuracy: 0.5297 - val_loss: 132.7681 - val_accuracy: 0.5228
Epoch 65/200
2456/2456 [==============================] - 0s 21us/sample - loss: 215.0770 - accuracy: 0.4919 - val_loss: 12554.3564 - val_accuracy: 0.0033
Epoch 200/200
2456/2456 [==============================] - 0s 20us/sample - loss: 331.2774 - accuracy: 0.3103 - val_loss: 44.3924 - val_accuracy: 0.5212

标准缩放器：

Epoch 1/200
2456/2456 [==============================] - 0s 22us/sample - loss: nan - accuracy: 0.5297 - val_loss: nan - val_accuracy: 0.5228
...
Epoch 200/200
2456/2456 [==============================] - 0s 12us/sample - loss: nan - accuracy: 0.5297 - val_loss: nan - val_accuracy: 0.5228

in: x_train
out:array([[-0.82125176, -0.73628742, -0.22547625, ..., -0.15796663,
        -0.75079814,  0.00855544],
       [-0.82125176, -0.73628742, -0.22547625, ..., -0.15796663,
        -0.75079814,  0.00855544],
       [-0.82125176, -0.73628742, -0.22547625, ..., -0.15796663,
        -0.75079814,  0.00855544],
       ...,
       [ 0.90140878,  1.14083087, -0.22547625, ..., -0.02445519,
         1.41760241,  0.09675613],
       [ 0.76359594,  1.14083087,  0.3660863 , ...,  0.30325472,
         1.09146087, -0.01575742],
       [-0.82125176, -0.73628742, -0.22547625, ..., -0.15796663,
        -0.75079814,  0.00855544]])

当我使用 MinMaxScaler 时这不是问题

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

#classification problems (class labels)
#not stratifying want to keep distribution underlying the data 
#x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify = y, random_state=42)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,random_state=42)
#minmax linearly shifting between zero and 1 can make (-1,1) stdscalar is scaling with respect to std.
scaler1 = MinMaxScaler()
scaler2 = StandardScaler()
scaler2.fit(x_train)
x_train = scaler2.transform(x_train)
x_test = scaler2.transform(x_test)#SCALAR LOOKS AT TRAINING DATA LEARNS WHAT TO SCALE




model = tf.keras.Sequential()
#regukarize model to reduce size of coeff. 
#128 UNITS IN LAYER
model.add(tf.keras.layers.Dense(64, input_shape=(9,), activation = 'relu'))
model.add(tf.keras.layers.Dense(1))
#model.add(tf.keras.layers.Dense(1, activation = 'sigmoid'))

opt = tf.keras.optimizers.SGD()
#loss = tf.keras.losses.binary_crossentropy
loss = tf.keras.losses.mean_squared_error
model.compile(optimizer = opt, 
              loss = loss,
             metrics = ["accuracy"])
#DEFINE THEN COMPILE MODEL


model.fit(x_train, y_train, batch_size=128, epochs=200, validation_data=(x_test, y_test))

在数据方面，以下是我转成x:(3070,9)和y:(3070,1)的numpy.array的dataframe的head(10)和tail(5)。我使用 PAL 列作为我的“Y”值，并希望通过使用由数据框中的所有其他列减去 PAL 列和 Players 列组成的“X”矩阵来预测它们。



Position    eFG%    iFG Reb Ast T/O Blk PF  PER*    PAL
0   PG  0.562   0.30    6.8 10.5    3.3 0.8 5.0 22.8    3.205000
1   SG  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
2   SF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3   PF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4   C   0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
5   PG  0.465   0.24    7.1 9.3 3.5 0.3 5.7 16.7    -0.125000
6   SG  0.500   0.25    4.3 5.3 4.3 0.4 3.9 11.5    1.271667
7   SF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
8   PF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
9   C   0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3065    PG  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3066    SG  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3067    SF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3068    PF  0.000   0.00    0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3069    C   0.577   0.52    16.9    4.3 2.4 2.1 6.8 25.7    3.658333

提前感谢您的帮助！编辑：我提供了有关我的数据集的更多信息。

【问题讨论】：

可以上传数据吗？
makis，你想要整个数据框吗？我刚刚添加了前 10 行和后 5 行，但所有其他列都非常相似。

标签： tensorflow machine-learning keras scikit-learn deep-learning

【解决方案1】：

不要使用 SGD 或（标准梯度下降）优化器，而是尝试使用 RMSProp（均方根 Prop）。就我而言，这解决了问题。

【讨论】：