【问题标题】:Expected object of device type cuda but got device type cpu设备类型为 cuda 的预期对象,但获得了设备类型 cpu
【发布时间】:2020-02-24 20:52:56
【问题描述】:

我正在尝试将我的网络训练从 cpu 切换到 gpu,但不断收到以下错误。

我收到以下错误

Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward
Error occurs, No graph saved
Traceback (most recent call last):

  File "<ipython-input-6-2720a5ea768d>", line 12, in <module>
    tb.add_graph(network, images)

  File "E:\Anaconda\lib\site-packages\torch\utils\tensorboard\writer.py", line 707, in add_graph
    self._get_file_writer().add_graph(graph(model, input_to_model, verbose))

  File "E:\Anaconda\lib\site-packages\torch\utils\tensorboard\_pytorch_graph.py", line 291, in graph
    raise e

  File "E:\Anaconda\lib\site-packages\torch\utils\tensorboard\_pytorch_graph.py", line 285, in graph
    trace = torch.jit.trace(model, args)

  File "E:\Anaconda\lib\site-packages\torch\jit\__init__.py", line 882, in trace
    check_tolerance, _force_outplace, _module_class)

  File "E:\Anaconda\lib\site-packages\torch\jit\__init__.py", line 1034, in trace_module
    module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 530, in __call__
    result = self._slow_forward(*input, **kwargs)

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 516, in _slow_forward
    result = self.forward(*input, **kwargs)

  File "<ipython-input-5-cd44a4e4fb73>", line 52, in forward
    t = F.relu(self.conv1(t))

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 530, in __call__
    result = self._slow_forward(*input, **kwargs)

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 516, in _slow_forward
    result = self.forward(*input, **kwargs)

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\conv.py", line 345, in forward
    return self.conv2d_forward(input, self.weight)

  File "E:\Anaconda\lib\site-packages\torch\nn\modules\conv.py", line 342, in conv2d_forward
    self.padding, self.dilation, self.groups)

RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _thnn_conv2d_forward```**

我认为它说参数是 cpu 类型,但我在训练部分更改了它。

我有以下代码

卷积神经网络

class Network(nn.Module):
    def __init__(self):
    super(Network, self).__init__()
    self.conv1 = nn.Conv2d( in_channels= 1, out_channels= 6, kernel_size=5 )
    self.conv2 = nn.Conv2d( in_channels= 6, out_channels= 12, kernel_size=5 )

    self.fc1 = nn.Linear( in_features = 12*4*4, out_features = 120 )
    self.fc2 = nn.Linear( in_features = 120, out_features = 60 )
    self.out = nn.Linear( in_features = 60, out_features = 10 )


def forward(self, t):
    t = F.relu(self.conv1(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.conv2(t))
    t = F.max_pool2d(t, kernel_size=2, stride=2)

    t = F.relu(self.fc1(t.reshape(-1, 12*4*4)))
    t = F.relu(self.fc2(t))
    t = self.out(t)

    return t

训练部分

parameters = dict(
    lr = [.01, .001]
    , batch_size = [10, 100, 1000]
    , shuffle = [True, False]
    )

param_values = [v for v in parameters.values()]
param_values

for lr, batch_size, shuffle in product(*param_values):
network = Network()
network.to(device)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle = shuffle)
optimizer = optim.Adam(network.parameters(), lr=lr)
images, labels = next(iter(train_loader))    
grid = torchvision.utils.make_grid(images)

comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'
tb = SummaryWriter(comment = comment)
tb.add_image('images', grid)
tb.add_graph(network, images)

for epoch in range(10):

    total_loss = 0
    total_correct = 0

    for batch in train_loader: # Get batch
        images, labels = batch
        images = images.to(device)  # Changing data to gpu
        preds = network(images)
        loss = F.cross_entropy(preds, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * batch_size
        total_correct += get_num_correct(preds, labels)

    tb.add_scalar('Loss:', total_loss, epoch)
    tb.add_scalar('Number Correct:', total_correct, epoch)
    tb.add_scalar('Accuracy:', total_correct/len(train_set), epoch)

    #tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
    #tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
    #tb.add_histogram('conv1.weight.grap', network.conv1.weight.grad, epoch)

    for name, weight in network.named_parameters():
        tb.add_histogram(name, weight, epoch)
        tb.add_histogram(f'{name}.grad', weight.grad, epoch)

    print("epoch:", epoch, "total_correct:", total_correct, "loss:",total_loss)

tb.close()

我是深度学习的新手,因此我们将不胜感激。谢谢

【问题讨论】:

    标签: python deep-learning pytorch conv-neural-network


    【解决方案1】:

    您错过了将您的 labels 移动到 gpu 即

    labels = labels.to(device)
    

    您还需要将这些移动到 gpu:

    images, labels = next(iter(train_loader))    
    images = images.to(device)
    labels = labels.to(device)
    

    【讨论】:

    • 刚刚运行它并没有工作。还缺什么吗?
    • @Friday101 您可以编辑问题以粘贴完整的错误吗?
    • @Friday101 您还需要将 for 循环之前的图像和标签移动到 gpu。检查编辑。
    猜你喜欢
    • 2020-03-07
    • 2020-10-28
    • 2021-04-27
    • 2020-10-08
    • 2020-04-08
    • 2020-11-22
    • 1970-01-01
    • 1970-01-01
    • 2020-11-16
    相关资源
    最近更新 更多