【发布时间】:2020-09-15 11:05:12
【问题描述】:
我正在尝试在 imdb 电影评论数据集上训练我的模型,以从文本中提取情感。
代码如下:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
imdb, info = tfds.load('imdb_reviews', with_info=True, as_supervised=True)
train_data, test_data = imdb['train'], imdb['test']
training_sentences = []
training_labels = []
testing_sentences = []
testing_labels = []
for s,l in train_data:
training_sentences.append(str(s.numpy()))
training_labels.append(str(l.numpy()))
for s,l in test_data:
testing_sentences.append(str(s.numpy()))
testing_labels.append(str(l.numpy()))
training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)
vocab_size = 10000
embedding_dim = 16
max_length = 120
trunc_type = 'post'
oov_tok = '<OOV>'
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words = vocab_size, oov_token = oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
sequence = tokenizer.texts_to_sequences(training_sentences)
padded = pad_sequences(sequence, maxlen=max_length, truncating=trunc_type)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, maxlen=max_length, truncating=trunc_type)
model = tf.keras.Sequential([tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(6, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
但是当我用这段代码拟合模型时:
num_epochs=10
model.fit(padded, training_labels_final, epochs=num_epochs, validation_data=(testing_padded, testing_labels_final))
弹出这个错误:
UnimplementedError: 不支持将字符串转换为浮点数 [[node Cast (defined at >>ipython-input-12-7ee16367fdd2>:2)]] [操作:__inference_train_function_302373] 函数调用栈:train_function
我不知道如何前进,我被卡住了。
【问题讨论】:
标签: python python-3.x tensorflow nlp tensorflow2.0