如何解决这个损失是这个带有 GRU 的 RNN 的 PyTorch 中的 NaN 问题？答案

【问题标题】：How to fix this loss is NaN problem in PyTorch of this RNN with GRU?如何解决这个损失是这个带有 GRU 的 RNN 的 PyTorch 中的 NaN 问题？
【发布时间】：2021-01-07 01:40:49
【问题描述】：

我对 PyTorch 完全陌生，并尝试了一些模型。我想对股市价格做一个简单的预测，发现了以下代码：

我用 pandas 加载数据集，然后将其拆分为训练和测试数据，并将其加载到 pytorch DataLoader 中，以便以后在训练过程中使用。该模型在 GRU 类中定义。但实际问题似乎是优化。我认为问题可能是梯度爆炸。我考虑过添加渐变剪裁，但 GRU 设计实际上应该防止渐变爆炸，还是我错了？什么可能导致损失立即变为 NaN（已经在第一个 epoch 中）

from sklearn.preprocessing import MinMaxScaler

import time
import pandas as pd
import numpy as np

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

batch_size = 200
input_dim = 1
hidden_dim = 32
num_layers = 2
output_dim = 1
num_epochs = 10

nvda = pd.read_csv('dataset/stocks/NVDA.csv')
price = nvda[['Close']]
scaler = MinMaxScaler(feature_range=(-1, 1))
price['Close'] = scaler.fit_transform(price['Close'].values.reshape(-1, 1))

def split_data(stock, lookback):
    data_raw = stock.to_numpy()  # convert to numpy array
    data = []

    # create all possible sequences of length seq_len
    for index in range(len(data_raw) - lookback):
        data.append(data_raw[index: index + lookback])

    data = np.array(data)
    test_set_size = int(np.round(0.2 * data.shape[0]))
    train_set_size = data.shape[0] - (test_set_size)

    x_train = data[:train_set_size, :-1, :]
    y_train = data[:train_set_size, -1, :]

    x_test = data[train_set_size:, :-1]
    y_test = data[train_set_size:, -1, :]

    return [x_train, y_train, x_test, y_test]


lookback = 20  # choose sequence length
x_train, y_train, x_test, y_test = split_data(price, lookback)

train_data = TensorDataset(torch.from_numpy(x_train).float(), torch.from_numpy(y_train).float())
train_data = DataLoader(train_data, shuffle=True, batch_size=batch_size, drop_last=True)

test_data = TensorDataset(torch.from_numpy(x_test).float(), torch.from_numpy(y_test).float())
test_data = DataLoader(test_data, shuffle=True, batch_size=batch_size, drop_last=True)


class GRU(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(GRU, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers

        self.gru = nn.GRU(input_dim, hidden_dim, num_layers, batch_first=True, dropout=0.2)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()

    def forward(self, x, h):

        out, h = self.gru(x, h)
        out = self.fc(self.relu(out[:, -1]))
        return out, h

    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = weight.new(self.num_layers, batch_size, self.hidden_dim).zero_()
        return hidden


model = GRU(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0000000001)
model.train()

start_time = time.time()

h = model.init_hidden(batch_size)
for epoch in range(1, num_epochs+1):
    for x, y in train_data:
        h = h.data
        model.zero_grad()
        y_train_pred, h = model(x, h)
        loss = criterion(y_train_pred, y)
        print("Epoch ", epoch, "MSE: ", loss.item())
        loss.backward()
        optimizer.step()


training_time = time.time() - start_time
print("Training time: {}".format(training_time))

这是我使用的dataset。

【问题讨论】：

嘿！ GRU 不防止梯度爆炸；它可以防止梯度消失。所以你可以应用渐变剪裁。您必须仅使用训练数据训练您的缩放器，并使用它来转换测试数据。您不必将 relu 激活应用于为您提供输出的层（我认为这就是您得到 nan 损失的原因）；所以删除它。最后，可能是你的学习率太小了；您应该从默认参数开始，即 1e-3。
我使用了 gard 剪辑并删除了 relu 层，但它仍然无法正常工作。我使用的 grad 剪辑值是 1。

标签： machine-learning neural-network pytorch recurrent-neural-network

【解决方案1】：

不确定是否是这种情况，但您是否对数据进行了预处理和清理？我不知道，但可能缺少一些值，或者这很奇怪。我在这里检查过 https://ca.finance.yahoo.com/quote/NVDA/history?p=NVDA 似乎每两行都有一些不一致。就像我说的，我不知道是不是这样，但可能是这样。

【讨论】：

我的测试数据如示例所示进行了预处理，我通过了它，我认为它不需要清理