【发布时间】:2020-07-05 05:36:41
【问题描述】:
我目前正在开发一个模型,以使用深度学习算法从文本中检测情绪。我有一个相对较小的标记数据集(~7500),有 7 种不同的情绪作为类。我开发了一个 CNN 并达到了约 63% 的准确率,但是当我尝试应用一个使用 LSTM 的 RNN 和一个同样使用 LSTM 的 CNN-RNN 时,它们似乎根本没有正确训练并且总是最终预测同一个班级。我相信我的模型基本上是合理的,但参数有一些错误。我将数据集分成 85% 用于训练,另外 20% 用于验证,剩下的 15% 用于测试。我的嵌入矩阵是使用 Google News word2vec 中的单词表示开发的,单词索引是使用 keras Tokenizer 开发的。
数据集细分:
情绪
愤怒1086
厌恶 1074
恐惧 1086
内疚 1062
欢乐1089
悲伤1080
耻辱 1058
CNN 实现
def make_model(kernel_sizes, num_filters, dropout, hidden_units):
submodels = []
for kernel_size in kernel_sizes:
submodel = Sequential()
submodel.add(Embedding(input_dim = input_dim,
output_dim = output_dim,
weights = [embedding_matrix],
input_length = max_len,
trainable = True))
submodel.add(Conv1D(filters=num_filters, kernel_size=kernel_size, padding='same',activation='relu',strides=1))
submodel.add(GlobalMaxPooling1D())
submodels.append(submodel)
submodel_outputs = [model.output for model in submodels]
submodel_inputs = [model.input for model in submodels]
merged = Concatenate(axis=1)(submodel_outputs)
x = Dropout(dropout)(merged)
if(hidden_units > 0):
x = Dense(hidden_units, activation='relu')(x)
x = Dropout(dropout)(x)
x = Dense(7,activation='softmax', kernel_initializer="uniform")(x)
out = Activation('sigmoid')(x)
model = Model(submodel_inputs, out)
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['acc'])
return model
def fit_model(model, kernel_sizes, num_epochs, batch_size, x_train, y_train):
x_train = [x_train]*len(kernel_sizes)
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=num_epochs, validation_split=0.2)
return history
kernel_sizes = [2,6]
num_filters = 100
dropout = 0.6
num_hidden = 270
callbacks = callbacks_list
num_epochs = 15
batch_size = 64
model = make_model(kernel_sizes, num_filters, dropout, num_hidden)
print(model.summary())
history = fit_model(model, kernel_sizes, num_epochs, batch_size, x_train, y_train)
型号:“model_1”
层(类型)输出形状参数#连接到
embedding_1_input (InputLayer) (无, 179) 0
embedding_2_input (InputLayer) (无, 179) 0
embedding_1 (嵌入) (None, 179, 300) 2729400 embedding_1_input[0][0]
embedding_2 (嵌入) (None, 179, 300) 2729400 embedding_2_input[0][0]
conv1d_1 (Conv1D) (None, 179, 100) 60100 embedding_1[0][0]
conv1d_2 (Conv1D) (None, 179, 100) 180100 embedding_2[0][0]
global_max_pooling1d_1 (GlobalM (None, 100) 0 conv1d_1[0][0]
global_max_pooling1d_2 (GlobalM (None, 100) 0 conv1d_2[0][0]
concatenate_1(连接)(无,200)0 global_max_pooling1d_1[0][0]
global_max_pooling1d_2[0][0]
dropout_1(丢弃)(无,200)0 concatenate_1[0][0]
dense_1(密集)(无,270)54270 dropout_1[0][0]
dropout_2 (Dropout) (None, 270) 0 dense_1[0][0]
dense_2(密集)(无,7)1897 dropout_2[0][0]
activation_1(激活)(无,7)0 dense_2[0][0]
总参数:5,755,167 可训练参数:5,755,167 不可训练参数:0
Training and Validation results for CNN
RNN 实现
def make_model(lstm_units, dropout, hidden_units):
model = Sequential()
model.add(Embedding(input_dim = input_dim,
output_dim = output_dim,
weights = [embedding_matrix],
input_length = max_len,
trainable = False))
model.add(LSTM(lstm_units))
model.add(Dropout(dropout))
if(hidden_units > 0):
model.add(Dense(hidden_units, activation='elu'))
model.add(Dropout(dropout))
model.add(Dense(7,activation='softmax', kernel_initializer="uniform"))
model.add(Activation('sigmoid'))
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['acc'])
return model
lstm_units = 120
dropout = 0.5
hidden_units = 550
callbacks = [tensorboard, early]
num_epochs = 20
batch_size = 60
model = make_model(lstm_units, dropout, hidden_units)
print(model.summary())
history = fit_model(model, num_epochs, batch_size, x_train, y_train)
型号:“sequential_6”
层(类型)输出形状参数#
embedding_6 (嵌入) (None, 179, 300) 2729400
lstm_8 (LSTM) (无, 120) 202080
dropout_5(辍学)(无,120)0
dense_6(密集)(无,550)66550
dropout_6(丢弃)(无,550)0
dense_7(密集)(无,7)3857
activation_3(激活)(无,7)0
总参数:3,001,887 可训练参数:272,487 不可训练参数:2,729,400
RNN training and validation scores
CNN-RNN 实现
def make_model(kernel_sizes, num_filters, dropout, hidden_units, lstm_units):
submodels = []
for kernel_size in kernel_sizes:
submodel = Sequential()
submodel.add(Embedding(input_dim = input_dim,
output_dim = output_dim,
weights = [embedding_matrix],
input_length = max_len,
trainable = True))
submodel.add(Conv1D(filters=num_filters, kernel_size=kernel_size, padding='same',activation='relu',strides=1))
submodel.add(MaxPooling1D(pool_size=2, strides = 2))
submodel.add(Dropout(dropout))
submodel.add(LSTM(lstm_units))
submodels.append(submodel)
submodel_outputs = [model.output for model in submodels]
submodel_inputs = [model.input for model in submodels]
merged = Concatenate(axis=1)(submodel_outputs)
x = Dropout(dropout)(merged)
if(hidden_units > 0):
x = Dense(hidden_units, activation='relu')(x)
x = Dropout(dropout)(x)
x = Dense(7,activation='softmax', kernel_initializer="uniform")(x)
out = Activation('sigmoid')(x)
model = Model(submodel_inputs, out)
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['acc'])
return model
kernel_sizes = [2,3,6]
num_filters = 100
dropout = 0.6
num_hidden = 270
lstm_units = 80
callbacks = [tensorboard, early]
num_epochs = 20
batch_size = 64
model = make_model(kernel_sizes, num_filters, dropout, num_hidden, lstm_units)
print(model.summary())
history = fit_model(model, kernel_sizes, num_epochs, batch_size, x_train, y_train)
型号:“model_2”
层(类型)输出形状参数#连接到
embedding_8_input (InputLayer) (无, 179) 0
embedding_9_input (InputLayer) (无, 179) 0
embedding_10_input (InputLayer) (None, 179) 0
embedding_8 (嵌入) (None, 179, 300) 2729400 embedding_8_input[0][0]
embedding_9 (嵌入) (None, 179, 300) 2729400 embedding_9_input[0][0]
embedding_10 (嵌入) (None, 179, 300) 2729400 embedding_10_input[0][0]
conv1d_8 (Conv1D) (None, 179, 100) 60100 embedding_8[0][0]
conv1d_9 (Conv1D) (None, 179, 100) 90100 embedding_9[0][0]
conv1d_10 (Conv1D) (None, 179, 100) 180100 embedding_10[0][0]
max_pooling1d_7 (MaxPooling1D) (None, 89, 100) 0 conv1d_8[0][0]
max_pooling1d_8 (MaxPooling1D) (None, 89, 100) 0 conv1d_9[0][0]
max_pooling1d_9 (MaxPooling1D) (None, 89, 100) 0 conv1d_10[0][0]
dropout_9 (Dropout) (None, 89, 100) 0 max_pooling1d_7[0][0]
dropout_10 (Dropout) (None, 89, 100) 0 max_pooling1d_8[0][0]
dropout_11 (Dropout) (None, 89, 100) 0 max_pooling1d_9[0][0]
lstm_2 (LSTM) (None, 80) 57920 dropout_9[0][0]
lstm_3 (LSTM) (None, 80) 57920 dropout_10[0][0]
lstm_4 (LSTM) (无, 80) 57920 dropout_11[0][0]
concatenate_3(连接)(无,240)0 lstm_2[0][0]
lstm_3[0][0]
lstm_4[0][0]
dropout_12(丢弃)(无,240)0 concatenate_3[0][0]
dense_3(密集)(无,270)65070 dropout_12[0][0]
dropout_13 (Dropout) (None, 270) 0 dense_3[0][0]
dense_4(密集)(无,7)1897 dropout_13[0][0]
activation_2(激活)(无,7)0 dense_4[0][0]
总参数:8,759,227 可训练参数:8,759,227 不可训练参数:0
CNN-RNN training and validation scores CNN-RNN confusion matrix
我知道神经网络没有神奇的公式,也没有一刀切的方法,我只是在我在实施 CNN-RNN 和 RNN 时可能犯错误的领域寻找一些指导。
对于任何格式错误,请提前致歉,因为这是我提出的第一个问题。如果需要任何其他信息,请告诉我。
非常感谢。
【问题讨论】:
标签: python keras nlp conv-neural-network recurrent-neural-network