【发布时间】:2023-12-08 01:08:02
【问题描述】:
我导出了一个双向 LSTM 用于新闻标题的情绪分析,但是在训练模型时,损失函数值没有提高,保持在 0.6 和 0.7 左右。 当然我做错了什么,我想知道它是否与嵌入层有关。
我以 10 的大小和 30 个单词的句子长度将每个批次迭代地传递到网络中,我的词汇量是 5745,因此经过一次热编码后,这个张量的形状将是 (10, 30, 5745)。
我的嵌入层有 num_embeddings = 5745 和 embed-dim = 100,所以当我调用 self.embedding(input) 时,输出形状为:(10, 30, 5745, 100)
我希望输出形状为:(10,30,100)
因此我使用了这行代码:
embeddings = torch.max(embeddings, dim=2)
但我不确定它是否符合我对每个单词/单热向量的期望,即:
如果我有一个表示形状为 (5745,1) 的单词的热编码向量和一个形状为 (100, 5745) 的嵌入矩阵,我会得到一个 (100,1) 的嵌入向量,因此我会有一个输出(10,30,100) 通过执行上述代码? 也许我的想法不正确,它影响了我的最终结果
循环神经网络:
class RNN(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim, n_layers, dropout):
#calling the init function of the RNN parent
super(RNN, self).__init__()
self.embedding = nn.Embedding(vocab_size, embed_dim)
self.encoder = nn.LSTM(embed_dim,
hidden_dim,
n_layers,
dropout=dropout,
bidirectional=True
)
#Linear transformation
self.decoder = nn.Linear(hidden_dim*2, output_dim)
self.dropout = nn.Dropout(dropout)
def forward(self, inputs):
#(batch_size, timesteps, embed_dim)
embeddings = self.dropout(self.embedding(inputs))
embeddings = torch.max(embeddings, dim=2)
embeddings = embeddings[0].type(torch.cuda.FloatTensor)
#output of each timestep
output, (hidden, cell) = self.encoder(embeddings)
merge = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1))
output = self.decoder(merge)
return output
火车:
def train(model, text, label, epochs, lr=0.001):
model.train()
opt = torch.optim.Adam(model.parameters(), lr=lr)
#The BCEWithLogitsLoss criterion carries out both the sigmoid and the binary cross entropy steps
criterion = nn.BCEWithLogitsLoss()
counter = 0
for e in range(epochs):
for x, y in batches(text, label):
#In order to avoid gradient accumulation before backpropagation
x = one_hot_encode(x, vocab)
x = torch.from_numpy(x).to(device)
output = model(x)
#print(torch.cuda.memory_summary(device=None, abbreviated=False))
y = torch.from_numpy(y)
y = torch.unsqueeze(y,1).to(device)
#print(output.shape)
#print(y.shape)
loss = criterion(output, y.float())
acc = binaryAccuracy(output,y)
opt.zero_grad()
loss.backward()
opt.step()
counter += 1
print("Epoch {}/{}".format(e+1, epochs),
"Loss: {}".format(loss.item()),
"accuracy: {}".format(acc))
单热:
def one_hot_encode(arr, n_labels):
one_hot = np.zeros((np.multiply(*arr.shape), n_labels), dtype=np.int64)
one_hot[np.arange(one_hot.shape[0]), arr.flatten()] = 1
one_hot = one_hot.reshape((*arr.shape, n_labels))
return one_hot
批次:
def batches(text, label, num_seqs=10):
counter = 0
#create empty arrays with the specified number of columns
x = np.array([], dtype=int).reshape(0,findMaxLen())
y = np.array([], dtype=int).reshape(0,1)
for sent, l in zip(text, label):
# create a np array with zeros with length 30
tmp1 = np.zeros((findMaxLen()), dtype=int)
#tmp1 = np.randint(0, high=vocab,size=findMaxLen(), dtype=int)
#create a 1d array
tmp2 = np.atleast_1d(np.array(l))
for ind, wrd in enumerate(sent):
if wrd in uniqueWrds and ind < 30:
tmp1[ind] = word_to_index[wrd]
# the arrays to the empty arrays
x = np.vstack([x, tmp1])
y = np.vstack([y, tmp2])
counter +=1
if counter == num_seqs:
yield x, np.squeeze(y,1)
counter = 0
x = np.array([], dtype=int).reshape(0,findMaxLen())
y = np.array([], dtype=int).reshape(0,1)
【问题讨论】:
标签: python pytorch lstm one-hot-encoding word-embedding