【发布时间】:2021-05-05 12:57:00
【问题描述】:
我有以下代码:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
import numpy as np
from numpy.random import seed
from tensorflow import random
seed(42)
random.set_seed(43)
X = [
'may it all be fine in the world',
'this is not for me',
'pffff ugly bike',
'dropping by to say leave me alone',
'getting sarcastic by now'
'how would one satisfy his or her needs when the earth is boiling'
]
y = [1,2,4,5,3]
tokenizer = Tokenizer(num_words = 13)
tokenizer.fit_on_texts(X)
X_train_seq = tokenizer.texts_to_sequences(X)
X_train_seq_padded = pad_sequences(X_train_seq, maxlen = 15)
model = Sequential()
model.add(Dense(16, input_dim = 15, activation = 'relu', name = 'hidden-1'))
model.add(Dense(16, activation = 'relu', name = 'hidden-2'))
model.add(Dense(16, activation = 'relu', name = 'hidden-3'))
model.add(Dense(5, activation='softmax', name = 'output_layer'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics=['accuracy'])
class CustomCallback(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
print('finished an epoch')
zin = 'dropping by to say leave her alone'
zin = tokenizer.texts_to_sequences(zin)
zin = pad_sequences(zin, maxlen = 15)
print(model.predict(zin))
print(np.argmax(model.predict(zin), axis=-1))
callbacks = [EarlyStopping(monitor = 'accuracy', patience = 5, mode = 'max'), CustomCallback()]
from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
y = encoder.fit_transform(y)
history = model.fit(X_train_seq_padded, y, epochs = 100, batch_size = 100, callbacks = callbacks)
我希望在回调 model.predict() 内部会产生类似的结果(因为有 5 个可能的类):
[0.4534534, 0.5634246, 0.0045623, 0.0004536, 0.0000056]
和 np.argmax(model.predict(zin), axis=-1) 在单个数字 1、2、3、4 或 5 中。
但是我收到的输出(显示一个时期)是:
我必须如何解释这一点,如何过滤掉模型预测句子所属的实际类别?
【问题讨论】:
-
我认为您正在寻找类似
# setup data to predict X_test = ['may it all be fine in the world'] tokenizer.fit_on_texts(X_test) X_test_seq = tokenizer.texts_to_sequences(X_test) X_test_seq_padded = pad_sequences(X_test_seq, maxlen = 15)的内容,然后预测并显示结果my_labels = [1,2,4,5,3] print(f"Prediction for {X_test} is {my_labels[np.argmax(model.predict(X_test_seq_padded))]}") -
我不应该对所有文本使用相同的分词器吗?
标签: python tensorflow keras neural-network