【发布时间】:2020-07-24 06:25:57
【问题描述】:
我正在尝试使用预训练的 word2vector 模型来创建词嵌入,但是当我尝试从 word2vec genism 模型创建权重矩阵时出现以下错误:
代码:
import gensim
w2v_model = gensim.models.KeyedVectors.load_word2vec_format("/content/drive/My Drive/GoogleNews-vectors-negative300.bin.gz", binary=True)
vocab_size = len(tokenizer.word_index) + 1
print(vocab_size)
EMBEDDING_DIM=300
# Function to create weight matrix from word2vec gensim model
def get_weight_matrix(model, vocab):
# total vocabulary size plus 0 for unknown words
vocab_size = len(vocab) + 1
# define weight matrix dimensions with all 0
weight_matrix = np.zeros((vocab_size, EMBEDDING_DIM))
# step vocab, store vectors using the Tokenizer's integer mapping
for word, i in vocab.items():
weight_matrix[i] = model[word]
return weight_matrix
embedding_vectors = get_weight_matrix(w2v_model, tokenizer.word_index)
我收到以下错误:
【问题讨论】: