【问题标题】：Train Keras Stateful LSTM return_seq=true not learning训练 Keras 有状态 LSTM return_seq=true 不学习
【发布时间】：2017-08-06 07:25:12
【问题描述】：

考虑这个最小的可运行示例：

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
import numpy as np
import matplotlib.pyplot as plt


max = 30
step = 0.5
n_steps = int(30/0.5)

x = np.arange(0,max,step)
x = np.cos(x)*(max-x)/max

y = np.roll(x,-1)
y[-1] = x[-1]

shape = (n_steps,1,1)
batch_shape = (1,1,1)

x = x.reshape(shape)
y = y.reshape(shape)

model = Sequential()
model.add(LSTM(50, return_sequences=True, stateful=True, batch_input_shape=batch_shape))
model.add(LSTM(50, return_sequences=True, stateful=True))

model.add(Dense(1))

model.compile(loss='mse', optimizer='rmsprop')

for i in range(1000):
    model.reset_states()
    model.fit(x,y,nb_epoch=1, batch_size=1)
    p = model.predict(x, batch_size=1)
    plt.clf()
    plt.axis([-1,31, -1.1, 1.1])
    plt.plot(x[:, 0, 0], '*')
    plt.plot(y[:,0,0],'o')
    plt.plot(p[:,0,0],'.')
    plt.draw()
    plt.pause(0.001)

如 keras API https://keras.io/layers/recurrent/中所述

批次中索引 i 处每个样本的最后状态将用作下一批中索引为 i 的样本的初始状态

所以我使用batch_size = 1 并尝试预测每个时间步长衰减余函数中的下一个值。预测，或下图中的红点应该进入绿色圆圈，以便脚本正确预测它，但是它没有收敛......有什么想法让它学习吗？

【问题讨论】：

在预测之前尝试reset_states。
仍然没有收敛:(
所以试试mae insteas of mse
没有区别。你确定这可以通过简单地改变超参数来解决吗？因为现在它根本没有学到任何东西。
这很奇怪 - 但我也会尝试 model.train_on_batch(x, y) 而不是你的 model.fit。这是因为 rmsprop 的参数在每个 epoch 后都会重置。

标签： machine-learning tensorflow neural-network deep-learning keras

【解决方案1】：

问题在于每个时期分别调用model.fit。在这种情况下，optimizer 参数被重置，这对训练过程有害。另一件事是在预测之前调用reset_states - 好像它没有被调用一样 - 来自fit 的states 是预测的起始状态，这也可能是有害的。最终代码如下：

for epoch in range(1000):
    model.reset_states()
    tot_loss = 0
    for batch in range(n_steps):
        batch_loss = model.train_on_batch(x[batch:batch+1], y[batch:batch+1])
        tot_loss+=batch_loss

    print "Loss: " + str(tot_loss/float(n_steps))
    model.reset_states()
    p = model.predict(x, batch_size=1)

【讨论】：