【问题标题】:Runtime error while running PyTorch model on local machine在本地机器上运行 PyTorch 模型时出现运行时错误
【发布时间】:2020-10-05 14:27:50
【问题描述】:

我在本地运行这个笔记本

https://github.com/udacity/deep-learning-v2-pytorch/blob/master/sentiment-rnn/Sentiment_RNN_Solution.ipynb 在我开始训练模型之前一切正常

# training params

epochs = 4 # 3-4 is approx where I noticed the validation loss stop decreasing

counter = 0
print_every = 100
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

net.train()
# train for some number of epochs
for e in range(epochs):
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1

        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output.squeeze(), labels.float())
        loss.backward()
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:
            # Get validation loss
            val_h = net.init_hidden(batch_size)
            val_losses = []
            net.eval()
            for inputs, labels in valid_loader:

                # Creating new variables for the hidden state, otherwise
                # we'd backprop through the entire training history
                val_h = tuple([each.data for each in val_h])

                if(train_on_gpu):
                    inputs, labels = inputs.cuda(), labels.cuda()

                output, val_h = net(inputs, val_h)
                val_loss = criterion(output.squeeze(), labels.float())

                val_losses.append(val_loss.item())

            net.train()
            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.6f}...".format(loss.item()),
                  "Val Loss: {:.6f}".format(np.mean(val_losses)))

发生的错误:

RuntimeError                              Traceback (most recent call last)
<ipython-input-31-9f7dea11cb7b> in <module>
     32 
     33         # get the output from the model
---> 34         output, h = net(inputs, h)
     35 
     36         # calculate the loss and perform backprop

c:\users\asus\.conda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-16-b99cefc1dc61> in forward(self, x, hidden)
     36 
     37         # embeddings and lstm_out
---> 38         embeds = self.embedding(x)
     39         lstm_out, hidden = self.lstm(embeds, hidden)
     40 

c:\users\asus\.conda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

c:\users\asus\.conda\envs\pytorch\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
    110 
    111     def forward(self, input):
--> 112         return F.embedding(
    113             input, self.weight, self.padding_idx, self.max_norm,
    114             self.norm_type, self.scale_grad_by_freq, self.sparse)

c:\users\asus\.conda\envs\pytorch\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1722         # remove once script supports set_grad_enabled
   1723         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1724     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1725 
   1726 

RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

我不明白为什么会这样。我试图在网上找到解决方案。这说我需要将我的模型和数据传输到 GPU。我做了,但问题仍然存在。

【问题讨论】:

    标签: python deep-learning pytorch


    【解决方案1】:

    您正在尝试嵌入以整数 (torch.int) 形式给出的 inputs。只能嵌入整数 (torch.long),因为它们需要是索引,不能是浮点数。

    inputs需要转换成torch.long

    inputs = inputs.to(torch.long)
    

    您似乎删除了对 long 的转换,因为在笔记本中它是在模型中完成的:

    # embeddings and lstm_out
    x = x.long()
    embeds = self.embedding(x)
    

    然而,在您的堆栈跟踪中,缺少行 x = x.long()(与使用 .to(torch.long) 相同)。

    【讨论】:

    • 我现在看到了,sentiment rnn notebook现在更新了,我好几天前下载的..我的不好
    最近更新 更多