将尺寸为 (3,50) 的嵌入层连接到 lstm答案

【问题标题】：将尺寸为 (3,50) 的嵌入层连接到 lstm
【发布时间】：2021-12-27 22:46:11
【问题描述】：

如何将尺寸为 (3,50) 的嵌入层连接到 lstm？

数组 (3, 50) 被馈送到输入“layer_i_emb”，其中存储了长度为 50 的数组的三个时间步长，其中存储了产品标识符

我尝试在 reshape 之前连接它，但它也不起作用。嵌入增加了维度，lstm 不需要额外的维度。您必须将张量转换为 tf 并手动使用张量，这很可怕。

layer_i_inp = Input(shape = (3,50), name = 'item')
layer_i_emb = Embedding(output_dim = EMBEDDING_DIM*2,
                        input_dim = us_it_count[0]+1,
                        input_length = (3,50),
                        name = 'item_embedding')(layer_i_inp) 

layer_i_emb = Reshape([3,50, EMBEDDING_DIM*2])(layer_i_emb)

layer_i_emb = LSTM(MAX_FEATURES, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(layer_i_emb)
layer_i_emb = LSTM(MAX_FEATURES, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(layer_i_emb)
layer_i_emb = LSTM(MAX_FEATURES, dropout = 0.4, recurrent_dropout = 0.4)(layer_i_emb)

layer_i_emb = Flatten()(layer_i_emb)

【问题讨论】：

标签： python tensorflow keras recurrent-neural-network embedding

【解决方案1】：

问题在于Embedding 层正在输出 3D 张量，但LSTM 层需要 2D 输入（不包括批处理维度）。您可以尝试以下几个选项：

选项 1

import tensorflow as tf

samples = 100
orders = 3
product_ids_per_order = 50
max_product_id = 120

data = tf.random.uniform((samples, orders, product_ids_per_order), maxval=max_product_id, dtype=tf.int32)
Y = tf.random.uniform((samples,), maxval=2, dtype=tf.int32)

EMBEDDING_DIM = 5

item_input = tf.keras.layers.Input(shape = (orders, product_ids_per_order), name = 'item')
embedding_layer = tf.keras.layers.Embedding(
                        max_product_id + 1,
                        output_dim = EMBEDDING_DIM,
                        input_length = product_ids_per_order,
                        name = 'item_embedding')

# Map each time step with 50 product ids to an embedding vector of size 5
outputs = []
for i in range(orders):
  tensor = embedding_layer(item_input[:, i, :])
  layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(tensor)
  layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(layer_i_emb)
  layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4)(layer_i_emb)
  outputs.append(layer_i_emb)
  
output = tf.keras.layers.Concatenate(axis=1)(outputs)
output = tf.keras.layers.Dense(1, activation='sigmoid')(layer_i_emb)
model = tf.keras.Model(item_input, output)
model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy())
model.fit(data, Y)

4/4 [==============================] - 15s 1s/step - loss: 0.6926

选项 2

import tensorflow as tf

samples = 100
orders = 3
product_ids_per_order = 50
max_product_id = 120

EMBEDDING_DIM = 5

item_input = tf.keras.layers.Input(shape = (orders, product_ids_per_order), name = 'item')
embedding_layer = tf.keras.layers.Embedding(
                        max_product_id + 1,
                        output_dim = EMBEDDING_DIM,
                        input_length = product_ids_per_order,
                        name = 'item_embedding')

# Map each time step with 50 product ids to an embedding vector of size 5
inputs = []
for i in range(orders):
  tensor = embedding_layer(item_input[:, i, :])
  tensor = tf.keras.layers.Reshape([product_ids_per_order*EMBEDDING_DIM])(tensor)
  tensor = tf.expand_dims(tensor, axis=1)
  inputs.append(tensor)

embedding_inputs = tf.keras.layers.Concatenate(axis=1)(inputs)
layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(embedding_inputs)
layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4, return_sequences = True)(layer_i_emb)
layer_i_emb = tf.keras.layers.LSTM(32, dropout = 0.4, recurrent_dropout = 0.4)(layer_i_emb)
output = tf.keras.layers.Dense(1, activation='sigmoid')(layer_i_emb)
model = tf.keras.Model(item_input, output)
print(model.summary())

Model: "model_11"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 item (InputLayer)              [(None, 3, 50)]      0           []                               
                                                                                                  
 tf.__operators__.getitem_41 (S  (None, 50)          0           ['item[0][0]']                   
 licingOpLambda)                                                                                  
                                                                                                  
 tf.__operators__.getitem_42 (S  (None, 50)          0           ['item[0][0]']                   
 licingOpLambda)                                                                                  
                                                                                                  
 tf.__operators__.getitem_43 (S  (None, 50)          0           ['item[0][0]']                   
 licingOpLambda)                                                                                  
                                                                                                  
 item_embedding (Embedding)     (None, 50, 5)        605         ['tf.__operators__.getitem_41[0][
                                                                 0]',                             
                                                                  'tf.__operators__.getitem_42[0][
                                                                 0]',                             
                                                                  'tf.__operators__.getitem_43[0][
                                                                 0]']                             
                                                                                                  
 reshape_10 (Reshape)           (None, 250)          0           ['item_embedding[0][0]']         
                                                                                                  
 reshape_11 (Reshape)           (None, 250)          0           ['item_embedding[1][0]']         
                                                                                                  
 reshape_12 (Reshape)           (None, 250)          0           ['item_embedding[2][0]']         
                                                                                                  
 tf.expand_dims_9 (TFOpLambda)  (None, 1, 250)       0           ['reshape_10[0][0]']             
                                                                                                  
 tf.expand_dims_10 (TFOpLambda)  (None, 1, 250)      0           ['reshape_11[0][0]']             
                                                                                                  
 tf.expand_dims_11 (TFOpLambda)  (None, 1, 250)      0           ['reshape_12[0][0]']             
                                                                                                  
 concatenate_13 (Concatenate)   (None, 3, 250)       0           ['tf.expand_dims_9[0][0]',       
                                                                  'tf.expand_dims_10[0][0]',      
                                                                  'tf.expand_dims_11[0][0]']      
                                                                                                  
 lstm_34 (LSTM)                 (None, 3, 32)        36224       ['concatenate_13[0][0]']         
                                                                                                  
 lstm_35 (LSTM)                 (None, 3, 32)        8320        ['lstm_34[0][0]']                
                                                                                                  
 lstm_36 (LSTM)                 (None, 32)           8320        ['lstm_35[0][0]']                
                                                                                                  
 dense_11 (Dense)               (None, 1)            33          ['lstm_36[0][0]']                
                                                                                                  
==================================================================================================
Total params: 53,502
Trainable params: 53,502
Non-trainable params: 0
__________________________________________________________________________________________________
None

【讨论】：

其中一些已经在 colab 的代码中。以及如何处理它尚不清楚。张量形式是这样的，因为它在每个步骤中考虑了三个时间步，其中我们有一篮子商品（50 个产品 id）并且产品 id 应该是嵌入的形式，因为它们有 2 万个），即我每个时间步提交 50 个产品 ID。嵌入决定确保网络不考虑第 3 个产品的 id 和第 4 个比第 3 个、第 1 个和第 50 个更相似的产品。
我不明白你到底想做什么......你想为你的第一个 LSTM 层提供什么样的输入？你的数据是什么样的？你能在你的问题中添加一些例子吗？
是的，您正确理解了输入的所有内容。输入 [(无, 3, 50)]。这是三个订单，每个订单包含 50 个产品。如何向网络解释这一点？如果我提交时没有嵌入，那么网络会感知到 4
我们有三个这样的数组（三个订单，三个时间步长，以便网络了解用户之前购买了什么），所以我进入 LSTM 层，它将首先了解前 50 个嵌入然后是50多个，然后是50多个））所以我想如何实现它？这样网络才能理解这些是商品的 id 而不是数值，并且 LSTM 层会分别接收有关每个订单的数据（一次通过 3 个订单）
更新答案。