LSTM 时间序列产生偏移预测？答案

【问题标题】：LSTM Time-Series produces shifted forecast?LSTM 时间序列产生偏移预测？
【发布时间】：2022-04-07 02:50:34
【问题描述】：

我正在使用 LSTM NN 和 Keras 进行时间序列预测。作为输入特征，有两个变量（降水和温度），要预测的一个目标是地下水位。

它似乎工作得很好，虽然实际数据和输出之间存在严重的偏移（见图）。

现在我读到这可能是网络无法正常工作的典型迹象，因为它似乎在模仿输出并且

模型实际上在做的是，在预测值时时间“t+1”，它只是使用时间“t”的值作为预测https://towardsdatascience.com/how-not-to-use-machine-learning-for-time-series-forecasting-avoiding-the-pitfalls-19f9d7adf424

但是，在我的情况下这实际上是不可能的，因为目标值不用作输入变量。我正在使用具有两个特征的多变量时间序列，与输出特征无关。此外，预测值不会在未来 (t+1) 发生偏移，而是似乎落后于 (t-1)。

有谁知道什么可能导致这个问题？

这是我的网络的完整代码：

# Split in Input and Output Data 
x_1 = data[['MeanT']].values
x_2 = data[['Precip']].values
y = data[['Z_424A_6857']].values

# Scale Data
x = np.hstack([x_1, x_2])
scaler = MinMaxScaler(feature_range=(0, 1))
x = scaler.fit_transform(x)

scaler_out = MinMaxScaler(feature_range=(0, 1))
y = scaler_out.fit_transform(y)

# Reshape Data
x_1, x_2, y = H.create2feature_data(x_1, x_2, y, window)
train_size = int(len(x_1) * .8)
test_size = int(len(x_1)) #  * .5

x_1 = np.expand_dims(x_1, 2) # 3D tensor with shape (batch_size, timesteps, input_dim) // (nr. of samples, nr. of timesteps, nr. of features)
x_2 = np.expand_dims(x_2, 2)
y = np.expand_dims(y, 1)

# Split Training Data
x_1_train = x_1[:train_size]
x_2_train = x_2[:train_size]
y_train = y[:train_size]

# Split Test Data
x_1_test = x_1[train_size:test_size]
x_2_test = x_2[train_size:test_size]
y_test = y[train_size:test_size]

# Define Model Input Sets
inputA = Input(shape=(window, 1))
inputB = Input(shape=(window, 1))

# Build Model Branch 1
branch_1 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputA)
branch_1 = layers.Dense(8, activation=act)(branch_1)
#branch_1 = layers.Dropout(0.2)(branch_1)
branch_1 = Model(inputs=inputA, outputs=branch_1) 

# Build Model Branch 2
branch_2 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputB)
branch_2 = layers.Dense(8, activation=act)(branch_2)
#branch_2 = layers.Dropout(0.2)(branch_2)
branch_2 = Model(inputs=inputB, outputs=branch_2) 

# Combine Model Branches
combined = layers.concatenate([branch_1.output, branch_2.output])
 
# apply a FC layer and then a regression prediction on the combined outputs
comb = layers.Dense(6, activation=act)(combined)
comb = layers.Dense(1, activation="linear")(comb)
 
# Accept the inputs of the two branches and then output a single value
model = Model(inputs=[branch_1.input, branch_2.input], outputs=comb)
model.compile(loss='mse', optimizer='adam', metrics=['mse', H.r2_score])

model.summary()

# Training
model.fit([x_1_train, x_2_train], y_train, epochs=epoch, batch_size=batch, validation_split=0.2, callbacks=[tensorboard])
model.reset_states()

# Evaluation
print('Train evaluation')
print(model.evaluate([x_1_train, x_2_train], y_train))

print('Test evaluation')
print(model.evaluate([x_1_test, x_2_test], y_test))

# Predictions
predictions_train = model.predict([x_1_train, x_2_train])
predictions_test = model.predict([x_1_test, x_2_test])

predictions_train = np.reshape(predictions_train, (-1,1))
predictions_test = np.reshape(predictions_test, (-1,1))

# Reverse Scaling
predictions_train = scaler_out.inverse_transform(predictions_train)
predictions_test = scaler_out.inverse_transform(predictions_test)

# Plot results
plt.figure(figsize=(15, 6))
plt.plot(orig_data, color='blue', label='True GWL')  
plt.plot(range(train_size), predictions_train, color='red', label='Predicted GWL (Training)')
plt.plot(range(train_size, test_size), predictions_test, color='green', label='Predicted GWL (Test)')
plt.title('GWL Prediction')  
plt.xlabel('Day')  
plt.ylabel('GWL')  
plt.legend()  
plt.show()

我使用 30 个时间步长的批量大小，90 个时间步长的回溯，总数据大小约为 7500 个时间步长。

任何帮助将不胜感激 :-) 谢谢！

【问题讨论】：

这里有些地方不清楚。您是使用 [t-90:t] 之间的时间步长来预测 t+1 步长，还是使用 t 后 10 天的值，即 t+10？另外，你的训练集的准确性是多少？似乎您的模型没有经过适当的训练，因为预测和实际训练数据点之间存在很大差异，即高偏差。但是，如果您使用提前停止，您可能会有很大的偏见......
@euren 嘿，感谢您的评论。是的，我使用 [t-90:t] 之间的时间步来预测 t+1 步。我的训练集的准确度约为 0.002 (mse)，测试集的准确度约为 0.014 (mse)。到目前为止，我没有使用提前停止。预测的和实际的训练数据之间的差异是困扰我的，或者，转变是困扰我的。你知道我该如何弥补这种转变吗？
如果你没有使用early stopping，那么mse 0.02的损失很高，通常你应该很容易得到e-5或e-6。这也可以通过预测和真实的列车变化来证明。也许您没有正确移动数据，即预处理。也许你应该检查那里。
另外 90 天也太长了，没有注意力的 LSTM 无法在这么长的范围内有效地工作。即使有注意力，他们也没有发挥出最大的潜力。我建议你改为改为 10 个时间步长。
@eugen：好的。我正在使用我的 create2feature_data-function def create2feature_data(x_1, x_2, y, window_size = 1): inp_1, inp_2, out = [], [], [] for i in range(window_size, len(x_1)): inp_1.append(x_1[i-window_size:i, 0]) inp_2.append(x_2[i-window_size:i, 0]) out.append(y[i, 0]) return(np.array(inp_1), np.array(inp_2), np.array(out)) 进行预处理，这就是我为数据创建滑动窗口的地方。否则我不会转移。在预处理中我可以做些什么不同的事情？

标签： keras neural-network deep-learning time-series prediction

【解决方案1】：

两年后我的答案可能不相关，但我在尝试 LSTM 编码器-解码器模型时遇到了类似的问题。我通过在-1 .. 1 范围内缩放输入数据来解决我的问题，而不是像您的示例中的0 .. 1。

【讨论】：