【问题标题】:LSTM Model in Keras with Auxiliary Inputs带有辅助输入的 Keras 中的 LSTM 模型
【发布时间】:2017-10-05 08:43:57
【问题描述】:

我有一个包含 2 列的数据集 - 每列包含一组文档。我必须将 Col A 中的文档与 Col B 中提供的文档进行匹配。这是一个有监督的分类问题。所以我的训练数据包含一个标签列,指示文档是否匹配。

为了解决这个问题,我创建了一组特征,比如 f1-f25(通过比较 2 个文档),然后针对这些特征训练一个二元分类器。这种方法效果很好,但现在我想评估这个问题的深度学习模型(特别是 LSTM 模型)。

我在 Python 中使用 keras 库。在浏览了在线可用的 keras 文档和其他教程后,我设法做到了以下几点:

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Each document contains a series of 200 words 
# The necessary text pre-processing steps have been completed to transform  
  each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')

# Next I add a word embedding layer (embed_matrix is separately created    
for each word in my vocabulary by reading from a pre-trained embedding model)
x = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input1)
y = Embedding(output_dim=300, input_dim=20000, 
input_length=200, weights = [embed_matrix])(main_input2)

# Next separately pass each layer thru a lstm layer to transform seq of   
vectors into a single sequence
lstm_out_x1 = LSTM(32)(x)
lstm_out_x2 = LSTM(32)(y)

# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
# generate intermediate output
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)

# add auxiliary input - auxiliary inputs contains 25 features for each document pair
auxiliary_input = Input(shape=(25,), name='aux_input')

# merge aux output with aux input and stack dense layer on top
main_input = keras.layers.concatenate([auxiliary_output, auxiliary_input])
x = Dense(64, activation='relu')(main_input)
x = Dense(64, activation='relu')(x)

# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

model = Model(inputs=[main_input1, main_input2, auxiliary_input], outputs= main_output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit([x1, x2,aux_input], y,
      epochs=3, batch_size=32)

但是,当我在训练数据上对此进行评分时,我得到了相同的概率。所有情况的得分。问题似乎与输入辅助输入的方式有关(因为当我删除辅助输入时它会生成有意义的输出)。 我还尝试在网络的不同位置插入辅助输入。但不知何故,我无法让它工作。

任何指针?

【问题讨论】:

  • 不确定这是否是有意的,但辅助输出仅为 (1,)。真的是你所期望的吗?合并 25 个辅助输入而只有一个结果? -- 辅助输出之前的模型是否旨在“不可训练”,而您只训练最后一部分?
  • 嗯,是的。这是一个二元分类模型,因此最终输出为 (1,)。辅助输出应该不同吗?我只是输入额外的 25 个特征集作为辅助输入,因此是 (25,) 形状
  • 你尝试过更多epochs吗?

标签: keras keras-layer


【解决方案1】:

我从https://datascience.stackexchange.com/questions/17099/adding-features-to-time-series-model-lstmMr.Philippe Remy 那里找到了答案,他编写了一个库来调节辅助输入。我使用了他的图书馆,非常有帮助。

# 10 stations
# 365 days
# 3 continuous variables A and B => C is target.
# 2 conditions dim=5 and dim=1. First cond is one-hot. Second is continuous.
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

from cond_rnn import ConditionalRNN

stations = 10  # 10 stations.
time_steps = 365  # 365 days.
continuous_variables_per_station = 3  # A,B,C where C is the target.
condition_variables_per_station = 2  # 2 variables of dim 5 and 1.
condition_dim_1 = 5
condition_dim_2 = 1

np.random.seed(123)
continuous_data = np.random.uniform(size=(stations, time_steps, continuous_variables_per_station))
condition_data_1 = np.zeros(shape=(stations, condition_dim_1))
condition_data_1[:, 0] = 1  # dummy.
condition_data_2 = np.random.uniform(size=(stations, condition_dim_2))

window = 50  # we split series in 50 days (look-back window)

x, y, c1, c2 = [], [], [], []
for i in range(window, continuous_data.shape[1]):
    x.append(continuous_data[:, i - window:i])
    y.append(continuous_data[:, i])
    c1.append(condition_data_1)  # just replicate.
    c2.append(condition_data_2)  # just replicate.

# now we have (batch_dim, station_dim, time_steps, input_dim).
x = np.array(x)
y = np.array(y)
c1 = np.array(c1)
c2 = np.array(c2)

print(x.shape, y.shape, c1.shape, c2.shape)

# let's collapse the station_dim in the batch_dim.
x = np.reshape(x, [-1, window, x.shape[-1]])
y = np.reshape(y, [-1, y.shape[-1]])
c1 = np.reshape(c1, [-1, c1.shape[-1]])
c2 = np.reshape(c2, [-1, c2.shape[-1]])

print(x.shape, y.shape, c1.shape, c2.shape)

model = Sequential(layers=[
    ConditionalRNN(10, cell='GRU'),  # num_cells = 10
    Dense(units=1, activation='linear')  # regression problem.
])

model.compile(optimizer='adam', loss='mse')
model.fit(x=[x, c1, c2], y=y, epochs=2, validation_split=0.2)

【讨论】:

    【解决方案2】:

    嗯,这已经开放了几个月,人们正在投票。
    我最近使用this dataset 做了一些非常类似的事情,它可以用来预测信用卡违约,它包含客户的分类数据(性别、教育水平、婚姻状况等)以及作为时间序列的付款历史。所以我不得不将时间序列与非序列数据合并。通过将 LSTM 与密集相结合,我的解决方案与您的解决方案非常相似,我尝试采用这种方法来解决您的问题。对我有用的是辅助输入上的密集层。

    此外,在您的情况下,共享层是有意义的,因此使用相同的权重来“读取”两个文档。我对您的数据进行测试的建议:

    from keras.layers import Input, Embedding, LSTM, Dense
    from keras.models import Model
    
    # Each document contains a series of 200 words 
    # The necessary text pre-processing steps have been completed to transform  
      each doc to a fixed length seq
    main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
    main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')
    
    # Next I add a word embedding layer (embed_matrix is separately created    
    for each word in my vocabulary by reading from a pre-trained embedding model)
    x1 = Embedding(output_dim=300, input_dim=20000, 
    input_length=200, weights = [embed_matrix])(main_input1)
    x2 = Embedding(output_dim=300, input_dim=20000, 
    input_length=200, weights = [embed_matrix])(main_input2)
    
    # Next separately pass each layer thru a lstm layer to transform seq of   
    vectors into a single sequence
    # Comment Manngo: Here I changed to shared layer
    # Also renamed y as input as it was confusing
    # Now x and y are x1 and x2
    lstm_reader = LSTM(32)
    lstm_out_x1 = lstm_reader(x1)
    lstm_out_x2 = lstm_reader(x2)
    
    # concatenate the 2 layers and stack a dense layer on top
    x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
    x = Dense(64, activation='relu')(x)
    x = Dense(32, activation='relu')(x)
    # generate intermediate output
    # Comment Manngo: This is created as a dead-end
    # It will not be used as an input of any layers below
    auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)
    
    # add auxiliary input - auxiliary inputs contains 25 features for each document pair
    # Comment Manngo: Dense branch on the comparison features
    auxiliary_input = Input(shape=(25,), name='aux_input')
    auxiliary_input = Dense(64, activation='relu')(auxiliary_input)
    auxiliary_input = Dense(32, activation='relu')(auxiliary_input)
    
    # OLD: merge aux output with aux input and stack dense layer on top
    # Comment Manngo: actually this is merging the aux output preparation dense with the aux input processing dense
    main_input = keras.layers.concatenate([x, auxiliary_input])
    main = Dense(64, activation='relu')(main_input)
    main = Dense(64, activation='relu')(main)
    
    # finally add the main output layer
    main_output = Dense(1, activation='sigmoid', name='main_output')(main)
    
    # Compile
    # Comment Manngo: also define weighting of outputs, main as 1, auxiliary as 0.5
    model.compile(optimizer=adam,
                  loss={'main_output': 'w_binary_crossentropy', 'aux_output': 'binary_crossentropy'},
                  loss_weights={'main_output': 1.,'auxiliary_output': 0.5},
                  metrics=['accuracy'])
    
    # Train model on main_output and on auxiliary_output as a support
    # Comment Manngo: Unknown information marked with placeholders ____
    # We have 3 inputs: x1 and x2: the 2 strings
    # aux_in: the 25 features
    # We have 2 outputs: main and auxiliary; both have the same targets -> (binary)y
    
    
    model.fit({'main_input1': __x1__, 'main_input2': __x2__, 'auxiliary_input' : __aux_in__}, {'main_output': __y__, 'auxiliary_output': __y__}, 
                  epochs=1000, 
                  batch_size=__, 
                  validation_split=0.1, 
                  callbacks=[____])
    

    我不知道这有多大帮助,因为我没有您的数据,所以我无法尝试。尽管如此,这是我最好的选择。
    出于明显的原因,我没有运行上述代码。

    【讨论】:

    • 我正在研究纵向医学数据,并试图了解您所做的事情。两个连接的 lstm 层选择两组不同的输入。我说的对吗?
    • 是的,我的措辞是 x1 和 x2。
    • @Manngo 嗨,我还必须将时间序列与非序列数据合并,以预测不同位置的气象变量(由非序列数据区分)。可以分享您在这方面所做的工作吗?在我的案例中,不同位置的时间序列数据长度不同。
    • @Basilique 你的意思是来自 1 个模型的多个预测,每个位置一个?对于不同长度的时间序列,您也许可以查看支持变量采样但在同一时间窗口内的 PLSTM。
    • @Manngo 我有两个时间相关特征 T、P 和两个非时间相关变量 S 和 D。目标也是时间相关变量 Q。我想要一个全局模型根据我所有 500 个站点的信息进行训练,而不是训练 500 个本地个体模型。我希望全局模型有两个分支:一个上游分支,我向其提供非时间序列变量,然后是一个下游分支,我向其提供 500 个位置的时间序列。我将generator 用于本地模型。我不知道如何将生成器与embedding 层结合起来。
    猜你喜欢
    • 2018-11-27
    • 1970-01-01
    • 1970-01-01
    • 2016-11-09
    • 1970-01-01
    • 2017-09-14
    • 2017-07-29
    • 2020-10-25
    • 1970-01-01
    相关资源
    最近更新 更多