Keras LSTM + TensorFlow 和一个数列（改善损失）答案

【问题标题】：Keras LSTM + TensorFlow and a number sequence (improve loss)Keras LSTM + TensorFlow 和一个数列（改善损失）
【发布时间】：2018-06-05 19:21:17
【问题描述】：

首先，我使用以下设置运行：

在 Windows 10 上运行
Python 3.6.2
TensorFlow 1.8.0
Keras 2.1.6

我正在尝试预测或至少猜测以下数字序列： https://codepen.io/anon/pen/RJRPPx（限20000条测试），全序列包含约100万条记录。

这是代码（run.py）

import lstm
import time
import matplotlib.pyplot as plt

def plot_results(predicted_data, true_data):
    fig = plt.figure(facecolor='white')
    ax = fig.add_subplot(111)
    ax.plot(true_data, label='True Data')
    plt.plot(predicted_data, label='Prediction')
    plt.legend()
    plt.show()

def plot_results_multiple(predicted_data, true_data, prediction_len):
    fig = plt.figure(facecolor='white')
    ax = fig.add_subplot(111)
    ax.plot(true_data, label='True Data')
    #Pad the list of predictions to shift it in the graph to it's correct start
    for i, data in enumerate(predicted_data):
        padding = [None for p in range(i * prediction_len)]
        plt.plot(padding + data, label='Prediction')
        plt.legend()
    plt.show()

#Main Run Thread
if __name__=='__main__':
    global_start_time = time.time()
    epochs  = 10
    seq_len = 50

    print('> Loading data... ')

    X_train, y_train, X_test, y_test = lstm.load_data('dice_amplified/primeros_20_mil.csv', seq_len, True)

    print('> Data Loaded. Compiling...')

    model = lstm.build_model([1, 50, 100, 1])

    model.fit(
        X_train,
        y_train,
        batch_size = 512,
        nb_epoch=epochs,
        validation_split=0.05)

    predictions = lstm.predict_sequences_multiple(model, X_test, seq_len, 50)
    #predicted = lstm.predict_sequence_full(model, X_test, seq_len)
    #predicted = lstm.predict_point_by_point(model, X_test)        

    print('Training duration (s) : ', time.time() - global_start_time)
    plot_results_multiple(predictions, y_test, 50)

我已尝试：

增加和减少时期。
增加和减少批量大小。
放大数据。

下图表示：

历元 = 10
batch_size = 512
validation_split = 0.05

同样，据我了解，损失应该随着时代的增加而减少？这似乎没有发生！

Using TensorFlow backend.
> Loading data...
> Data Loaded. Compiling...
> Compilation Time :  0.03000473976135254
Train on 17056 samples, validate on 898 samples
Epoch 1/10
17056/17056 [==============================] - 31s 2ms/step - loss: 29927.0164 - val_loss: 289.8873
Epoch 2/10
17056/17056 [==============================] - 29s 2ms/step - loss: 29920.3513 - val_loss: 290.1069
Epoch 3/10
17056/17056 [==============================] - 29s 2ms/step - loss: 29920.4602 - val_loss: 292.7868
Epoch 4/10
17056/17056 [==============================] - 27s 2ms/step - loss: 29915.0955 - val_loss: 286.7317
Epoch 5/10
17056/17056 [==============================] - 26s 2ms/step - loss: 29913.6961 - val_loss: 298.7889
Epoch 6/10
17056/17056 [==============================] - 26s 2ms/step - loss: 29920.2068 - val_loss: 287.5138
Epoch 7/10
17056/17056 [==============================] - 28s 2ms/step - loss: 29914.0650 - val_loss: 295.2230
Epoch 8/10
17056/17056 [==============================] - 25s 1ms/step - loss: 29912.8860 - val_loss: 295.0592
Epoch 9/10
17056/17056 [==============================] - 28s 2ms/step - loss: 29907.4067 - val_loss: 286.9338
Epoch 10/10
17056/17056 [==============================] - 46s 3ms/step - loss: 29914.6869 - val_loss: 289.3236

有什么建议吗？我该如何改进它？谢谢！

Lstm.py 内容：

import os
import time
import warnings
import numpy as np
from numpy import newaxis
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' #Hide messy TensorFlow warnings
warnings.filterwarnings("ignore") #Hide messy Numpy warnings

def load_data(filename, seq_len, normalise_window):
    f = open(filename, 'rb').read()
    data = f.decode().split('\n')

    sequence_length = seq_len + 1
    result = []
    for index in range(len(data) - sequence_length):
        result.append(data[index: index + sequence_length])

    if normalise_window:
        result = normalise_windows(result)

    result = np.array(result)

    row = round(0.9 * result.shape[0])
    train = result[:int(row), :]
    np.random.shuffle(train)
    x_train = train[:, :-1]
    y_train = train[:, -1]
    x_test = result[int(row):, :-1]
    y_test = result[int(row):, -1]

    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))  

    return [x_train, y_train, x_test, y_test]

def normalise_windows(window_data):
    normalised_data = []
    for window in window_data:
        normalised_window = [((float(p) / float(window[0])) - 1) for p in window]
        normalised_data.append(normalised_window)
    return normalised_data

def build_model(layers):
    model = Sequential()

    model.add(LSTM(
        input_shape=(layers[1], layers[0]),
        output_dim=layers[1],
        return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(
        layers[2],
        return_sequences=False))
    model.add(Dropout(0.2))

    model.add(Dense(
        output_dim=layers[3]))
    model.add(Activation("linear"))

    start = time.time()
    model.compile(loss="mse", optimizer="rmsprop")
    print("> Compilation Time : ", time.time() - start)
    return model

def predict_point_by_point(model, data):
    #Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each time
    predicted = model.predict(data)
    predicted = np.reshape(predicted, (predicted.size,))
    return predicted

def predict_sequence_full(model, data, window_size):
    #Shift the window by 1 new prediction each time, re-run predictions on new window
    curr_frame = data[0]
    predicted = []
    for i in range(len(data)):
        predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])
        curr_frame = curr_frame[1:]
        curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)
    return predicted

def predict_sequences_multiple(model, data, window_size, prediction_len):
    #Predict sequence of 50 steps before shifting prediction run forward by 50 steps
    prediction_seqs = []
    for i in range(int(len(data)/prediction_len)):
        curr_frame = data[i*prediction_len]
        predicted = []
        for j in range(prediction_len):
            predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])
            curr_frame = curr_frame[1:]
            curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)
        prediction_seqs.append(predicted)
    return prediction_seqs

附录：

根据 nuric 的建议，我将模型修改如下：

def build_model(layers):
    model = Sequential()
    model.add(LSTM(input_shape=(layers[1], layers[0]), output_dim=layers[1], return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(layers[2], return_sequences=False))
    model.add(Dropout(0.2))
    model.add(Dense(output_dim=layers[3]))
    model.add(Activation("linear"))
    model.add(Dense(64, input_dim=50, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    start = time.time()
    model.compile(loss="mse", optimizer="rmsprop")
    print("> Compilation Time : ", time.time() - start)
    return model

在这方面还是有点失落……

【问题讨论】：

标签： python tensorflow keras sequence lstm

【解决方案1】：

即使您对输入进行了规范化，您也不会对输出进行规范化。 LSTM 默认具有tanh 输出，这意味着您将拥有有限的特征空间，即密集层将无法回归到大量数字。

你有一个固定长度的数字输入(50,)，直接将其传递给具有relu 激活的密集层，这样在回归任务上会表现得更好，比如：

model = Sequential()
model.add(Dense(64, input_dim=50, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

对于回归，最好使用 l2 regularizers 而不是 Dropout，因为您并不是真正为分类等进行特征提取。

【讨论】：

非常感谢@Nuric，但我对此有点陌生......你能用代码示例进一步解释一下吗？我明白了，但我什至不知道从哪里开始。再次感谢！
我建议研究深度学习回归模型，看看它们与分类问题有何不同。您可能不需要深度学习方法，像xgboost 这样的随机森林可能会更好。
谢谢@nuric，问题是我对随机森林一无所知，甚至比深度学习还要多。据我了解，我尝试修改模型，添加您提到的 3 Dense 层，我可能应该删除之前的一些层...关于 xgboost，我将立即开始阅读。
有机会，您知道我可以用作参考的任何好的教程吗？我发现的唯一一个在这里：quantinsti.com/blog/…，它使用一个独特的序列，但它都写在 R 上......使用 python 类似的东西是最好的。
这个post 可能会有所帮助。