极差的预测：LSTM 时间序列答案

【问题标题】：Extremely poor prediction: LSTM time-series极差的预测：LSTM 时间序列
【发布时间】：2018-10-07 19:52:25
【问题描述】：

我尝试实现 LSTM 模型来进行时间序列预测。下面是我的试用代码。此代码运行没有错误。不依赖也可以试试。

import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle

fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_steps = 7
def create_ds(data, t_steps):
    data = pd.DataFrame(data)
    data_s = data.copy()
    for i in range(time_steps):
        data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)   
    data.dropna(axis=0, inplace=True)
    return data.values

ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats

n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)

train_data = ds[:train_size, :]
train_data = shuffle(train_data)

test_data = ds[train_size:, :]

x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]

x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1])
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])

print (x_train.shape)
print (y_train.shape)
print (x_test.shape)
print (y_test.shape)

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2]), stateful=True, batch_size=1))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(n_feats, return_sequences=True, stateful=True)) 

model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=2)  
y_predict = model.predict(x_test)
y_predict = y_predict.reshape(y_predict.shape[1], y_predict.shape[2])

y_predict = scaler.inverse_transform(y_predict)

y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]

print (y_test.shape)
print (y_predict.shape)

plt.plot(y_test, label='True')
plt.plot(y_predict,  label='Predict')
plt.legend()
plt.show()

但是，预测极差。如何改进预测？你有什么改进它的想法吗？

有什么想法可以通过重新设计架构和/或层来改进预测？

【问题讨论】：

数据看起来很随机。也许这是 LSTM 在不过度拟合的情况下可以做到的最好的事情。一个好的经验法则是，如果您自己无法预测数据，则不应期望神经网络能够做到。
这个预测看起来相当不错，实际上......除非有一些关于振荡周期的规则，那么你可以用更强大的模型来捕捉那个周期。但如果这个时期没有遵循任何模式，那么这是一个很好的预测。

标签： python tensorflow deep-learning keras keras-layer

【解决方案1】：

如果你想在我的代码中使用模型（你传递的链接），你需要正确塑造数据：（1 个序列，total_time_steps，5 个特征）

重要提示：我不知道这是否是最好的方法或最好的模型，但这个模型预测输入提前 7 个时间步长 (time_shift=7)

数据和初始变量

    fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print("raw shape:")
print (raw.shape)
#(1789,5) - 1789 time steps / 5 features

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_shift = 7 #shift is the number of steps we are predicting ahead
n_rows = raw.shape[0] #n_rows is the number of time steps of our sequence
n_feats = raw.shape[1]
train_size = int(n_rows * 0.8)


#I couldn't understand how "ds" worked, so I simply removed it because in the code below it's not necessary

#getting the train part of the sequence
train_data = raw[:train_size, :] #first train_size steps, all 5 features
test_data = raw[train_size:, :] #I'll use the beginning of the data as state adjuster


#train_data = shuffle(train_data) !!!!!! we cannot shuffle time steps!!! we lose the sequence doing this

x_train = train_data[:-time_shift, :] #the entire train data, except the last shift steps 
x_test = test_data[:-time_shift,:] #the entire test data, except the last shift steps
x_predict = raw[:-time_shift,:] #the entire raw data, except the last shift steps

y_train = train_data[time_shift:, :] 
y_test = test_data[time_shift:,:]
y_predict_true = raw[time_shift:,:]

x_train = x_train.reshape(1, x_train.shape[0], x_train.shape[1]) #ok shape (1,steps,5) - 1 sequence, many steps, 5 features
y_train = y_train.reshape(1, y_train.shape[0], y_train.shape[1])
x_test = x_test.reshape(1, x_test.shape[0], x_test.shape[1])
y_test = y_test.reshape(1, y_test.shape[0], y_test.shape[1])
x_predict = x_predict.reshape(1, x_predict.shape[0], x_predict.shape[1])
y_predict_true = y_predict_true.reshape(1, y_predict_true.shape[0], y_predict_true.shape[1])

print("\nx_train:")
print (x_train.shape)
print("y_train")
print (y_train.shape)
print("x_test")
print (x_test.shape)
print("y_test")
print (y_test.shape)

型号

你的模型对于这个任务不是很强大，所以我尝试了一个更大的模型（另一方面这个模型太强大了）

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(256, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True)) 

model.compile(loss='mse', optimizer='adam')

拟合

请注意，我必须训练 2000 多个 epoch 才能使模型获得良好的结果。
我添加了验证数据，以便我们可以比较训练和测试的损失。

#notice that I'm predicting from the ENTIRE sequence, including x_train      
#is important for the model to adjust its states before predicting the end
model.fit(x_train, y_train, epochs=1000, batch_size=1, verbose=2, validation_data=(x_test,y_test))

预测

重要：至于根据开头预测序列的结尾，重要的是模型看到开头来调整内部状态，所以我预测的是整个数据（x_predict），而不仅仅是测试数据。

y_predict_model = model.predict(x_predict)

print("\ny_predict_true:")
print (y_predict_true.shape)
print("y_predict_model: ")
print (y_predict_model.shape)


def plot(true, predicted, divider):

    predict_plot = scaler.inverse_transform(predicted[0])
    true_plot = scaler.inverse_transform(true[0])

    predict_plot = predict_plot[:,0]
    true_plot = true_plot[:,0]

    plt.figure(figsize=(16,6))
    plt.plot(true_plot, label='True',linewidth=5)
    plt.plot(predict_plot,  label='Predict',color='y')

    if divider > 0:
        maxVal = max(true_plot.max(),predict_plot.max())
        minVal = min(true_plot.min(),predict_plot.min())

        plt.plot([divider,divider],[minVal,maxVal],label='train/test limit',color='k')

    plt.legend()
    plt.show()

test_size = n_rows - train_size
print("test length: " + str(test_size))

plot(y_predict_true,y_predict_model,train_size)
plot(y_predict_true[:,-2*test_size:],y_predict_model[:,-2*test_size:],test_size)

显示全部数据

显示它的结尾部分以获得更多详细信息

请注意，这个模型是过拟合，这意味着它可以学习训练数据并在测试数据中得到不好的结果。

要解决这个问题，您必须通过实验尝试更小的模型，使用 dropout 层和其他技术来防止过度拟合。

还要注意，这些数据很可能包含很多随机因素，这意味着模型将无法从中学到任何有用的东西。当您制作较小的模型以避免过度拟合时，您可能还会发现该模型会对训练数据产生更差的预测。

找到完美的模型并非易事，这是一个悬而未决的问题，您必须进行实验。也许 LSTM 模型根本不是解决方案。也许您的数据根本无法预测，等等。对此没有明确的答案。

怎么知道模型好不好

使用训练中的验证数据，您可以比较训练数据和测试数据的损失。

Train on 1 samples, validate on 1 samples
Epoch 1/1000
9s - loss: 0.4040 - val_loss: 0.3348
Epoch 2/1000
4s - loss: 0.3332 - val_loss: 0.2651
Epoch 3/1000
4s - loss: 0.2656 - val_loss: 0.2035
Epoch 4/1000
4s - loss: 0.2061 - val_loss: 0.1696
Epoch 5/1000
4s - loss: 0.1761 - val_loss: 0.1601
Epoch 6/1000
4s - loss: 0.1697 - val_loss: 0.1476
Epoch 7/1000
4s - loss: 0.1536 - val_loss: 0.1287
Epoch 8/1000
.....

两者应该一起下降。当测试数据停止下降，但训练数据继续改进时，您的模型开始过度拟合。

尝试其他模型

我能做的最好的事情（但我并没有真正尝试太多）就是使用这个模型：

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(None, x_train.shape[2])))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(n_feats, return_sequences=True)) 

model.compile(loss='mse', optimizer='adam')

损失大约是什么时候：

loss: 0.0389 - val_loss: 0.0437

在这一点之后，验证损失开始上升（因此超过这一点的训练完全没用）

结果：

这表明该模型可以学习的所有行为都是非常全面的行为，例如具有较高值的区域。

但是高频要么太随机要么模型不够好……

【讨论】：

create_ds 所做的是，它使用所有 7 个变量（t-7、t-6、t-5、t-4、t-3、t-2、t-1） 5个功能。因此，总共 7*5=35 个特征被输入 X（train_x 或 test_x），而 5 个特征被输入 Y（train_y 或 test_y）。在您的回答中，您仅使用 t-7 变量作为 X。您能否以某种方式调整您的回答中的 35 个功能？
@hiker，恐怕我做不到。我不是真正的机器学习专家，你知道....我只是“很好地使用 keras”。
@DanielMöller 我认为，这个程序中最大的偏见来源之一是raw = scaler.fit_transform(raw)。它将训练数据和测试数据一起缩放，从而在预测中产生偏差。你觉得怎么样？
我觉得这个数据比较随机，能做的不多。

【解决方案2】：

你可以考虑改变你的模型：

import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed, Bidirectional
from sklearn.metrics import mean_squared_error, accuracy_score
from scipy.stats import linregress
from sklearn.utils import shuffle

fi = 'pollution.csv'
raw = pd.read_csv(fi, delimiter=',')
raw = raw.drop('Dates', axis=1)
print (raw.shape)

scaler = MinMaxScaler(feature_range=(-1, 1))
raw = scaler.fit_transform(raw)

time_steps = 7
def create_ds(data, t_steps):
    data = pd.DataFrame(data)
    data_s = data.copy()
    for i in range(time_steps):
        data = pd.concat([data, data_s.shift(-(i+1))], axis = 1)   
    data.dropna(axis=0, inplace=True)
    return data.values

ds = create_ds(raw, time_steps)
print (ds.shape)
n_feats = raw.shape[1]
n_obs = time_steps * n_feats

n_rows = ds.shape[0]
train_size = int(n_rows * 0.8)

train_data = ds[:train_size, :]
train_data = shuffle(train_data)

test_data = ds[train_size:, :]

x_train = train_data[:, :n_obs]
y_train = train_data[:, n_obs:]
x_test = test_data[:, :n_obs]
y_test = test_data[:, n_obs:]

print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)

x_train = x_train.reshape(x_train.shape[0], time_steps, n_feats)
x_test = x_test.reshape(x_test.shape[0], time_steps, n_feats)

print (x_train.shape)
print (x_test.shape)
print (y_train.shape)
print (y_test.shape)

model = Sequential()
model.add(LSTM(64, input_shape=(time_steps, n_feats), return_sequences=True))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(n_feats))

model.compile(loss='mse', optimizer='rmsprop')
model.fit(x_train, y_train, epochs=10, batch_size=1, verbose=1, shuffle=False)

y_predict = model.predict(x_test)
print (y_predict.shape)
y_predict = scaler.inverse_transform(y_predict)

y_test = scaler.inverse_transform(y_test)
y_test = y_test[:,0]
y_predict = y_predict[:,0]

print (y_test.shape)
print (y_predict.shape)

plt.plot(y_test, label='True')
plt.plot(y_predict,  label='Predict')
plt.legend()
plt.show()

但我真的不知道你的实施的优点：

* both x and y are 3d (1,steps,features) rather than x in 3d (samples, time-steps, features) and y in 2d (samples, features)
* input_shape=(None, x_train.shape[2])
* last layer - model.add(LSTM(n_feats, return_sequences=True, stateful=True))

有人可能会提供更好的答案。

【讨论】：

我遵循@Daniel Möller 的代码，认为它有优点。 github.com/danmoller/TestRepo/blob/master/TestBookLSTM.ipynb
@hiker，我正在查看您的代码，但有一些非常重要的差异使其与我的代码不同。 1 - x_train 包含 35 个特征（它应该只包含 5 个），2 - 似乎你在打乱数据，所以你失去了步骤的顺序，3 - 你正在训练一个 stateful=True 模型而不重置状态（注意在我的代码中，第一个模型不是有状态的，只有第二个 - 第二个模型的目的是无限输出一个步骤并将这一步作为输入，我没有训练第二个模型） - 这些差异当然让一切变得不同。
现在，显然没有“你的数据应该和我的一样”这样的规则，但你的模型肯定必须根据你的数据进行调整。我的模型不是。
关于 3D 中的 x 和 y。这个答案也是 3D 的（这是 keras 规则，不可能用非 3D 的数据训练 LSTM）。 -- input_shape=(None,features) 表示您可以输入任意长度的时间步长。（你不需要正好 7）——另一个区别：长度为 7 意味着你正在训练小的时间窗口，而我的模型适合一次训练整个序列。 --- 最后，关于 LSTM 而不是 Dense，这是模型设计中的一种可能性（可以有你想要的任何层），哪个更好，我不知道，测试可能会回答。

【解决方案3】：

我不确定你能做什么，这些数据看起来好像没有可辨别的模式。如果我看不到一个，我怀疑 LSTM 可以。不过，您的预测看起来确实是一条不错的回归线。

【讨论】：

虽然这是一个很好的总体思路，而且您很可能是对的，但使用神经网络的一大特点就是找到我们的大脑可能无法找到的模式。 --- 但这并不意味着每个数据都有这样的模式。

【解决方案4】：

阅读原始代码，作者似乎首先对数据集进行了缩放，然后将其拆分为训练和测试子集。这意味着有关测试子集的信息（例如，波动性等）已“泄漏”到训练子集中。

推荐的方法是先将Training/Testing分开，只使用Training子集计算缩放参数，然后使用这些参数分别对Training和Testing子集进行缩放。

【讨论】：

【解决方案5】：

我自己正在创建一个模型来预测这样的数据，我创建了一个 SMOTErnn 灵魂来添加作为过去的数据，我发现在 batch_size 上使用 TimeSeriesGenrator 时，步幅越大，效果越好。

【讨论】：