具有自注意力的 LSTM 用于多类文本分类答案

【问题标题】：LSTM with self attention for multi class text classification具有自注意力的 LSTM 用于多类文本分类
【发布时间】：2020-11-24 11:05:16
【问题描述】：

我在以下链接中关注 Keras 中的自我关注 How to add attention layer to a Bi-LSTM

我想将 BI LSTM 应用于具有 3 个类的多类文本分类。

我尝试在我的代码中应用注意力，但出现以下错误，我该如何解决这个问题？谁能帮帮我？

Incompatible shapes: [100,3] vs. [64,3]
     [[Node: training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Reshape_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Shape, training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Shape_1)]]




class attention(Layer):
    
    def __init__(self, return_sequences=False):
        self.return_sequences = return_sequences
        super(attention,self).__init__()
        
    def build(self, input_shape):
        
        self.W=self.add_weight(name="att_weight", shape=(input_shape[-1],1),
                               initializer="normal")
        self.b=self.add_weight(name="att_bias", shape=(input_shape[1],1),
                               initializer="zeros")
        
        super(attention,self).build(input_shape)
        
    def call(self, x):
        
        e = K.tanh(K.dot(x,self.W)+self.b)
        a = K.softmax(e, axis=1)
        output = x*a
        
        if self.return_sequences:
            return output
        
        return K.sum(output, axis=1)




model = Sequential()
model.add(Embedding(17666, 100, input_length=409))
model.add(Bidirectional(LSTM(32, return_sequences=False)))
model.add(attention(return_sequences=True)) # receive 3D and output 2D
model.add(Dropout(0.3))
model.add(Dense(3, activation='softmax'))


model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
history777=model.fit(x_train, y_train,
 batch_size=100,
 epochs=30,
 validation_data=(x_val, y_val),
 callbacks=[es])

the model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_14 (Embedding)     (None, 409, 100)          1766600   
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 64)                34048     
_________________________________________________________________
attention_14 (attention)     (None, 64)                128       
_________________________________________________________________
dropout_6 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 3)                 195       
=================================================================
Total params: 1,800,971
Trainable params: 1,800,971
Non-trainable params: 0
____

【问题讨论】：

标签： python keras deep-learning conv-neural-network

【解决方案1】：

注意在 LSTM 和注意力层中如何设置 return_sequence 参数

您的输出是 2D，因此最后一个返回序列必须设置为 False，而其他返回序列必须设置为 True

你的模型必须是

model = Sequential()
model.add(Embedding(max_words, emb_dim, input_length=max_len))
model.add(Bidirectional(LSTM(32, return_sequences=True))) # return_sequences=True
model.add(attention(return_sequences=False)) # return_sequences=False
model.add(Dropout(0.3))
model.add(Dense(3, activation='softmax'))

这里是完整的例子：https://colab.research.google.com/drive/13l5eAHS5uTUsdqyQNm1Dr4JEXg7Fl2Bo?usp=sharing

【讨论】：