【发布时间】:2019-12-24 06:33:49
【问题描述】:
我正在为 QA 机器人构建一个分类器,并且有一个包含 8k 个问题和 149 个不同答案的数据集。
我在训练我的模型时遇到了一些问题; “损失”不会像我预期的那样下降,所以我请求你的帮助......
这是我的方法:
我使用 word2vec 获取单词的向量,然后使用基于 GRU 的网络获取句子向量 w2v 模型已经使用 wiki 数据进行了训练,并且在我的另一个 NLP 项目中运行良好。 GRU代码是我学长写的,我觉得也很好用。
# Part of the code for getting sentence vector
input_size = 400
hidden_dim = 400
num_layers = 1
gru = nn.GRU(input_size, hidden_dim,num_layers,batch_first = True)
h0 = torch.rand(num_layers, 7187, hidden_dim) # (num_layers, batch, hidden_dim)
# shape of input [dataset_len,max_sentence_len,input_feature]
inputSet = torch.tensor(x_train,dtype = torch.float)
sentenceVecs, hidden = gru(inputSet,h0)
sentenceVecs = sentenceVecs[:,-1, :]
这是我的分类器模型
from argparse import Namespace
args = Namespace(
dataset_file = 'dataset/waimai_10k_tw.pkl',
model_save_path='torchmodel/pytorch_bce.model',
# Training hyper parameters
batch_size = 100,
learning_rate = 0.002,
min_learning_rate = 0.002,
num_epochs=200,
)
class JWP(nn.Module):
def __init__(self,
n_feature,
n_hidden,
n_hidden2,
n_hidden3,
n_output):
super(JWP, self).__init__()
self.hidden = nn.Linear(n_feature, n_hidden)
self.hidden2 = nn.Linear(n_hidden, n_hidden2)
self.hidden3 = nn.Linear(n_hidden2, n_hidden3)
self.out = nn.Linear(n_hidden3, n_output)
def forward(self, x, apply_softmax=False):
x = F.relu(self.hidden(x).squeeze())
x = F.relu(self.hidden2(x).squeeze())
x = F.relu(self.hidden3(x).squeeze())
#
if(apply_softmax):
x = torch.softmax(self.out(x))
else:
x = self.out(x)
return x
训练代码
lr = args.learning_rate
min_lr = args.min_learning_rate
def adjust_learning_rate(optimizer, epoch):
global lr
if epoch % 10 == 0 and epoch != 0:
lr = lr * 0.65
if(lr < min_lr):
lr = min_lr
for param_group in optimizer.param_groups:
param_group['lr'] = lr
if __name__ == "__main__":
EPOCH = args.num_epochs
net = JWP(400,325,275,225,149)
# net = JWP(400,250,149)
# net = JWP(400,149)
print(net)
optimizer = torch.optim.SGD(net.parameters(), lr=lr)
loss_func = torch.nn.CrossEntropyLoss()
for t in range(EPOCH):
adjust_learning_rate(optimizer,t)
"""
Train phase
"""
net.train()
TrainLoss = 0.0
# Train batch
for step,(batchData, batchTarget) in enumerate(trainDataLoader):
optimizer.zero_grad()
out = net(batchData)
loss = loss_func(out,batchTarget)
TrainLoss = TrainLoss + loss
loss.backward()
optimizer.step()
TrainLoss = TrainLoss / (step+1) # epoch loss
"""
Result
"""
print(
"epoch:",t+1 ,
"train_loss:",round(TrainLoss.item(),3),
"LR:",lr
)
是我的模型太简单还是我用错了方法? loss 一直卡在 4.6 左右,不能再降低了……
epoch: 2898 train_loss: 4.643 LR: 0.002
epoch: 2899 train_loss: 4.643 LR: 0.002
epoch: 2900 train_loss: 4.643 LR: 0.002
epoch: 2901 train_loss: 4.643 LR: 0.002
【问题讨论】:
-
问题是您是否看到初始培训/学习阶段的减少。 2900 epochs 之后报告的损失是否明显低于开始时的损失?然后您的模型可能会收敛(或者更多,如果您实际评估训练损失,则您欠拟合)。在这种情况下,请尝试通过引入更少的隐藏节点或提高学习率来降低模型的复杂性。
标签: python nlp pytorch multilabel-classification