【发布时间】:2020-09-03 16:58:35
【问题描述】:
我正在尝试训练一个简单的 tensorflow 模型来检测推文的情绪。数组的数据类型和大小是一致的,并且当recurrent_dropout 设置为某个浮点值时,模型训练得很好。但是,这会禁用 cuDNN,我真的很想加快速度(不是我们所有人),但是每当我删除经常性 dropout 参数时,模型训练都会在第一个 epoch 结束之前崩溃。
以下是相关代码,我省略了导入和加载 csv 文件。相关代码之后是最终输入尺寸和错误代码。此外,我已经弄清楚为什么 colab 似乎在削减训练数据。 Colab 会在拆分成批次后显示序列数,因此默认批次大小为 32,我们得到 859 个序列。不使用经常性 dropout 时的崩溃问题仍然是一个问题。旁注,这段代码是一个非常粗略的草稿,数据清理都在同一个笔记本中完成,因此缺乏典型的格式。
def remove_case(X):
removed_case = []
X = X.copy()
for text in X:
text = str(text).lower()
removed_case.append(text)
X = removed_case
return X
def remove_hyperlinks(X):
removed_hyperlinks = []
X = X.copy()
for text in X:
text = str(text)
text = re.sub(r'http\S+', '', text)
text = re.sub(r'https\S+', '', text)
text = re.sub(r'www\S+', '', text)
removed_hyperlinks.append(text)
X = removed_hyperlinks
return X
def remove_punctuation(X):
removed_punc = []
X = X.copy()
for text in X:
text = str(text)
text = "".join([char for char in text if char not in punctuation])
removed_punc.append(text)
X = removed_punc
return X
def split_text(X):
split_tweets = []
X = X.copy()
for text in X:
text = str(text).split()
split_tweets.append(text)
X = split_tweets
return X
def map_sentiment(X, l, m, n):
keys = ['negative', 'neutral', 'positive']
values = [l, m, n]
dictionary = dict(zip(keys, values))
X = X.copy()
X = X.map(dictionary)
return X
# # def sentiment_to_onehot(X):
# sentiment_foofs = []
# X = X.copy()
# for integer in X:
# if integer == "negative": # Negative
# integer = [1, 0, 0]
# elif integer == "neutral": # Neutral
# integer = [0, 1, 0]
# elif integer == "positive": # Positive
# integer = [0, 0, 1]
# else:
# break
# sentiment_foofs.append(integer)
# X = sentiment_foofs
# return X
train_no_punc_lowercase = train.copy()
train_no_punc_lowercase['text'] = remove_case(train_no_punc_lowercase['text'])
train_no_punc_lowercase['text'] = remove_hyperlinks(train_no_punc_lowercase['text'])
train_no_punc_lowercase['text'] = remove_punctuation(train_no_punc_lowercase['text'])
train_no_punc_lowercase['sentiment'] = map_sentiment(train_no_punc_lowercase['sentiment'], 0, 1, 2)
train_no_punc_lowercase.head()
test_no_punc_lowercase = test.copy()
test_no_punc_lowercase['text'] = remove_case(test_no_punc_lowercase['text'])
test_no_punc_lowercase['text'] = remove_hyperlinks(test_no_punc_lowercase['text'])
test_no_punc_lowercase['text'] = remove_punctuation(test_no_punc_lowercase['text'])
test_no_punc_lowercase['sentiment'] = map_sentiment(test_no_punc_lowercase['sentiment'], 0, 1, 2)
features = train.columns.tolist()
features.remove('textID') # all unique, high cardinality feature
features.remove('selected_text') # target
target = 'selected_text'
X_train_no_punc_lowercase = train_no_punc_lowercase[features]
y_train_no_punc_lowercase = train_no_punc_lowercase[target]
X_test_no_punc_lowercase = test_no_punc_lowercase[features]
def stemming_column(df_column):
ps = PorterStemmer()
stemmed_word_list = []
for i, string in enumerate(df_column):
tokens = word_tokenize(string)
new_string = ""
for j, words in enumerate(tokens):
new_string = new_string + ps.stem(words) + " "
stemmed_word_list.append(new_string)
return stemmed_word_list
def create_lookup_table(list1, list2):
main_list = []
lookup_dict = {}
i = 1 # used to create a value in the dictionary
main_list.append(list1)
main_list.append(list2)
for list in main_list:
for string in list:
for word in string.split():
if word not in lookup_dict:
lookup_dict[word] = i
i += 1
return lookup_dict
def encode(input_list, input_dict):
encoded_list = []
for string in input_list:
sentence_list = []
for word in string.split():
sentence_list.append(input_dict[word]) # value lookup from dictionary.. int
encoded_list.append(sentence_list)
return encoded_list
def pad_data(list_of_lists):
padded_data = tf.keras.preprocessing.sequence.pad_sequences(list_of_lists, padding='post')
return padded_data
def create_array_sentiment_integers(list):
sent_int_list = []
for sentiment in list:
sent_int_list.append(sentiment)
return np.asarray(sent_int_list, dtype=np.int32)
X_train_stemmed_list = stemming_column(X_train_no_punc_lowercase['text'])
X_test_stemmed_list = stemming_column(X_test_no_punc_lowercase['text'])
lookup_table = create_lookup_table(X_train_stemmed_list, X_test_stemmed_list)
X_train_encoded_list = encode(X_train_stemmed_list, lookup_table)
X_train_padded_data = pad_data(X_train_encoded_list)
Y_train = create_array_sentiment_integers(train_no_punc_lowercase['sentiment'])
max_features = 3 # 3 choices 0, 1, 2
Y_train_final = np.zeros((Y_train.shape[0], max_features), dtype=np.float32)
Y_train_final[np.arange(Y_train.shape[0]), Y_train] = 1.0
input_dimension = len(lookup_table) + 1
output_dimension = 64
input_length = 33
model = Sequential()
model.add(tf.keras.layers.Embedding(input_dim=input_dimension,
output_dim=output_dimension,
input_length=input_length,
mask_zero=True))
model.add(tf.keras.layers.LSTM(512, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))
model.add(tf.keras.layers.Dense(256, activation='sigmoid'))
model.add(tf.keras.layers.Dropout(0.2))
model.add(Dense(3, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(X_train_padded_data, Y_train_final, validation_split=0.20, epochs=10)
model.save('Tweet_sentiment.model')
此外,这里是数据集的形状..
x train shape: (27481, 33, 1) x train type: <class 'numpy.ndarray'> y train shape: (27481, 3)
错误代码
Epoch 1/3
363/859 [===========>..................] - ETA: 9s - loss: 0.5449 - accuracy: 0.5674
---------------------------------------------------------------------------
UnknownError Traceback (most recent call last)
<ipython-input-103-1d4af3962607> in <module>()
----> 1 model.fit(X_train_padded_data, Y_train_final, epochs=3,)
8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
UnknownError: [_Derived_] CUDNN_STATUS_BAD_PARAM
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1496): 'cudnnSetRNNDataDescriptor( data_desc.get(), data_type, layout, max_seq_length, batch_size, data_size, seq_lengths_array, (void*)&padding_fill)'
[[{{node cond_38/then/_0/CudnnRNNV3}}]]
[[sequential_5/lstm_4/StatefulPartitionedCall]] [Op:__inference_train_function_36098]
Function call stack:
train_function -> train_function -> train_function
【问题讨论】:
-
你能展示你的数据集的形状吗?
-
@ZabirAlNazi 当然,只是在错误代码之前添加到原始帖子中。
-
@NickP,看起来
Number of Time Steps非常高,因此当您申请recurrent_dropout时,由于一些Time Steps被丢弃,它不会崩溃。input_dimension的值是多少?另外,请分享完整的代码,以便我们为您提供帮助。谢谢! -
@TensorflowWarriors 您好,感谢您的回复,我已更新帖子以包含实际代码。我会说我对输入参数以及如何正确处理时间步长的理解可能存在一些差距。感谢您与我们联系。
标签: python python-3.x keras google-colaboratory tensorflow2.0