【发布时间】:2017-08-02 16:35:13
【问题描述】:
我在 keras 中编写了这个 RNN 文本分类系统,并提供了网络上的教程。它工作正常并显示输出。但是有人可以检查我的代码并告诉我我的实现是否正确吗?
# LSTM with dropout for sequence classification
import numpy
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.preprocessing import sequence,text
from keras.layers.embeddings import Embedding
import pandas as pd
# fix random seed for reproducibility
numpy.random.seed(7)
#fetching sms spam dataset
url = 'https://raw.githubusercontent.com/justmarkham/pydata-dc-2016-tutorial/master/sms.tsv'
sms = pd.read_table(url, header=None, names=['label', 'message'])
#binarizing
sms['label_num'] = sms.label.map({'ham':0, 'spam':1})
sms.head()
X = sms.message
y = sms.label_num
print(X.shape)
print(y.shape)
###################################
tk = text.Tokenizer(nb_words=200, lower=True)
tk.fit_on_texts(X)
x = tk.texts_to_sequences(X)
print len(tk.word_counts)
###################################
max_len = 80
print "max_len ", max_len
print('Pad sequences (samples x time)')
x = sequence.pad_sequences(x, maxlen=max_len)
max_features = 200
model = Sequential()
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
model.fit(x, y=y, batch_size=500, nb_epoch=1, verbose=1, validation_split=0.2, show_accuracy=True, shuffle=True)
【问题讨论】:
-
我投票结束这个问题,因为它属于 codereview.stackexchange.com
-
如何确定max_features、input_length和128是什么?
标签: python python-2.7 neural-network keras recurrent-neural-network