原文链接:http://www.one2know.cn/nlp19/

  • 使用IMDB情绪数据来比较CNN和RNN两种方法,预处理与上节相同
from __future__ import print_function
import numpy as np
import pandas as pd
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
from keras.datasets import imdb
from sklearn.metrics import accuracy_score,classification_report

# 限制最大的特征数
max_features = 15000
max_len = 300
batch_size = 64

# 加载数据
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
print(len(x_train),'train observations')
print(len(x_test),'test observations')

输出:

Using TensorFlow backend.
25000 train observations
25000 test observations
  • 如何实现
    1.预处理
    2.LSTM模型的构建和验证
    3.模型评估
  • 代码
from __future__ import print_function
import numpy as np
import pandas as pd
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
from keras.datasets import imdb
from sklearn.metrics import accuracy_score,classification_report

# 限制最大的特征数
max_features = 15000
max_len = 300
batch_size = 64

# 加载数据
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
# print(len(x_train),'train observations')
# print(len(x_test),'test observations')

# 通过序列填充将所有的数据整合为一个固定维度,提高运行效率
x_train_2 = sequence.pad_sequences(x_train,maxlen=max_len)
x_test_2 = sequence.pad_sequences(x_test,maxlen=max_len)
print('x_train_2.shape:',x_train_2.shape)
print('x_test_2.shape:',x_test_2.shape)
y_train = np.array(y_train)
y_test = np.array(y_test)

# keras框架 => 双向LSTM模型
# 双向LSTM网络有前向和后向连接,使句子中的单词可以同时与左右词汇产生连接
model = Sequential()
model.add(Embedding(max_features,128,input_length=max_len)) # 嵌入层将维数降到128
model.add(Bidirectional(LSTM(64))) # 双向LSTM层
model.add(Dropout(0.5)) # 随机失活
model.add(Dense(1,activation='sigmoid')) # 稠密层 将情感分类0或1
model.compile('adam','binary_crossentropy',metrics=['accuracy']) # 二元交叉熵
print(model.summary())

model.fit(x_train_2,y_train,batch_size=batch_size,epochs=4,validation_split=0.2)

# 预测及结果
y_train_predclass = model.predict_classes(x_train_2,batch_size=1000)
y_test_predclass = model.predict_classes(x_test_2,batch_size=1000)
y_train_predclass.shape = y_train.shape
y_test_predclass.shape = y_test.shape
print('\n\nLSTM Bidirectional Sentiment Classification - Train accuracy:',
      round(accuracy_score(y_train,y_train_predclass),3))
print('\nLSTM Bidirectional Sentiment Classification of Training data\n',
      classification_report(y_train,y_train_predclass))
print('\nLSTM Bidirectional Sentiment Classification - Train Confusion Matrix\n\n',
      pd.crosstab(y_train,y_train_predclass,rownames=['Actuall'],colnames=['Predicted']))
print('\nLSTM Bidirectional Sentiment Classification - Test accuracy:',
      round(accuracy_score(y_test,y_test_predclass),3))
print('\nLSTM Bidirectional Sentiment Classification of Test data\n',
      classification_report(y_test,y_test_predclass))
print('\nLSTM Bidirectional Sentiment Classification - Test Confusion Matrix\n\n',
      pd.crosstab(y_test,y_test_predclass,rownames=['Actuall'],colnames=['Predicted']))

输出:

Using TensorFlow backend.
x_train_2.shape: (25000, 300)
x_test_2.shape: (25000, 300)
WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 300, 128)          1920000   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               98816     
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
=================================================================
Total params: 2,018,945
Trainable params: 2,018,945
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
2019-07-07 20:03:45.649853: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

   64/20000 [..............................] - ETA: 18:21 - loss: 0.6915 - acc: 0.5781
  128/20000 [..............................] - ETA: 13:04 - loss: 0.6918 - acc: 0.5938
  192/20000 [..............................] - ETA: 11:14 - loss: 0.6915 - acc: 0.5729
  256/20000 [..............................] - ETA: 10:19 - loss: 0.6917 - acc: 0.5469
  320/20000 [..............................] - ETA: 9:45 - loss: 0.6915 - acc: 0.5469 
  此处省略一堆epoch的一堆操作

LSTM Bidirectional Sentiment Classification - Train accuracy: 0.955

LSTM Bidirectional Sentiment Classification of Training data
               precision    recall  f1-score   support

           0       0.96      0.95      0.95     12500
           1       0.95      0.96      0.95     12500

    accuracy                           0.95     25000
   macro avg       0.95      0.95      0.95     25000
weighted avg       0.95      0.95      0.95     25000

LSTM Bidirectional Sentiment Classification - Train Confusion Matrix

 Predicted      0      1
Actuall                
0          11928    572
1            561  11939

LSTM Bidirectional Sentiment Classification - Test accuracy: 0.859

LSTM Bidirectional Sentiment Classification of Test data
               precision    recall  f1-score   support

           0       0.86      0.86      0.86     12500
           1       0.86      0.85      0.86     12500

    accuracy                           0.86     25000
   macro avg       0.86      0.86      0.86     25000
weighted avg       0.86      0.86      0.86     25000


LSTM Bidirectional Sentiment Classification - Test Confusion Matrix

 Predicted      0      1
Actuall                
0          10809   1691
1           1829  10671
time============== 2080.618681907654

分类:

技术点:

相关文章: