从训练数据集 PyTorch 读取数据时出现运行时错误答案

【问题标题】：Runtime error when reading data from a training dataset PyTorch从训练数据集 PyTorch 读取数据时出现运行时错误
【发布时间】：2020-06-12 14:05:40
【问题描述】：

我的训练数据集中有一个数据样本，如果我打印数据，我可以查看它，但是在访问它来训练数据时，我不断收到RuntimeError: Expected object of scalar type Double but got scalar type Float for argument #2 'weight' in call to _thnn_conv2d_forward。我无法弄清楚为什么会这样。我还在最后附上了一张图片，以便更好地理解错误消息。

labels.txt 文件如下所示（链接到另一个文件夹的图像名称，具有相应的图像、中心点 (x, y) 和半径）

0000,   0.67 ,   0.69 ,   0.26 
0001,   0.69 ,   0.33 ,   0.3  
0002,   0.16 ,   0.27 ,   0.15 
0003,   0.54 ,   0.33 ,   0.17 
0004,   0.32 ,   0.45 ,   0.3  
0005,   0.78 ,   0.26 ,   0.17 
0006,   0.44 ,   0.49 ,   0.19

编辑：这是我正在使用的损失函数和优化器优化器 =

optim.Adam(model.parameters(), lr=0.001)
nn.CrossEntropyLoss()

我的验证模型功能如下：

def validate_model(model, loader):
    model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
                  # (dropout is set to zero)

    val_running_loss = 0.0
    val_running_correct = 0

    for int, data in enumerate(loader):
        data, target = data['image'].to(device), data['labels'].to(device)
        output = model(data)
        loss = my_loss(output, target)

        val_running_loss = val_running_loss + loss.item()
        _, preds = torch.max(output.data, 1)

        val_running_correct = val_running_correct+ (preds == target).sum().item()

    avg_loss = val_running_loss/len(loader.dataset)
    val_accuracy = 100. * val_running_correct/len(loader.dataset)

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------
    return avg_loss, val_accuracy

我有一个计算训练损失的拟合函数：

def fit(model, train_dataloader):
    model.train()
    train_running_loss = 0.0
    train_running_correct = 0
    for i, data in enumerate(train_dataloader):
        print(data)
        #I believe this is causing the error, but not sure why.
        data, target = data['image'].to(device), data['labels'].to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = my_loss(output, target)
        train_running_loss = train_running_loss + loss.item()
        _, preds = torch.max(output.data, 1)
        train_running_correct =  train_running_correct + (preds == target).sum().item()
        loss.backward()
        optimizer.step()
    train_loss = train_running_loss/len(train_dataloader.dataset)
    train_accuracy = 100. * train_running_correct/len(train_dataloader.dataset)

    print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.2f}')

    return train_loss, train_accuracy

以及以下将损失和准确率存储在列表中的 train_model 函数：

train_losses , train_accuracy = [], []
validation_losses , val_accuracy = [], []

def train_model(model,
                optimizer,
                train_loader,
                validation_loader,
                train_losses,
                validation_losses,
                epochs=1):

    """
    Trains a neural network. 
    Args:
        model               - model to be trained
        optimizer           - optimizer used for training
        train_loader        - loader from which data for training comes 
        validation_loader   - loader from which data for validation comes (maybe at the end, you use test_loader)
        train_losses        - adding train loss value to this list for future analysis
        validation_losses   - adding validation loss value to this list for future analysis
        epochs              - number of runs over the entire data set 
    """

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------

    for epoch in range(epochs):
        train_epoch_loss, train_epoch_accuracy = fit(model, train_loader)
        val_epoch_loss, val_epoch_accuracy = validate_model(model, validation_loader)
        train_losses.append(train_epoch_loss)
        train_accuracy.append(train_epoch_accuracy)
        validation_losses.append(val_epoch_loss)
        val_accuracy.append(val_epoch_accuracy)

    return

当我运行以下代码时，我得到了运行时错误：

train_model(model, 
            optimizer,
            train_loader, 
            validation_loader, 
            train_losses, 
            validation_losses,
            epochs=2)

错误：RuntimeError：标量类型的预期对象 Double 但得到标量类型浮点数，用于调用参数#2“权重” _thnn_conv2d_forward

这里也是错误消息的屏幕截图： ERROR

编辑：这就是我的模型的样子，我应该在labels.txt文件中检测具有给定中心和半径的图像中的圆圈并在它们上面绘画 - 给出了绘画功能，我已经创建了模型以及培训和验证。

class CircleNet(nn.Module):    # nn.Module is parent class  
    def __init__(self):
        super(CircleNet, self).__init__()  #calls init of parent class
        #----------------------------------------------
        # implementation needed here 
        #----------------------------------------------
        #keep dimensions of input image: (I-F+2P)/S +1= (128-3+2)/1 + 1 = 128

        #RGB image = input channels = 3. Use 12 filters for first 2 convolution layers, then double
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(in_channels=12, out_channels=24, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(in_channels=24, out_channels=32, kernel_size=3, stride=1, padding=1)

        #Pooling to reduce sizes, and dropout to prevent overfitting
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.relu = nn.ReLU()

        self.drop = nn.Dropout2d(p=0.25)
        self.norm1 = nn.BatchNorm2d(12)
        self.norm2 = nn.BatchNorm2d(24)

        # There are 2 pooling layers, each with kernel size of 2. Output size: 128/(2*2) = 32
        # Have 3 output features, corresponding to x-pos, y-pos, radius. 
        self.fc = nn.Linear(in_features=32 * 32 * 32, out_features=3)

    def forward(self, x):
        """
        Feed forward through network
        Args:
            x - input to the network

        Returns "x", which is the network's output
        """

        #----------------------------------------------
        # implementation needed here 
        #----------------------------------------------
        #Conv1
        out = self.conv1(x)
        out = self.pool(out)
        out = self.relu(out)
        out = self.norm1(out)
        #Conv2
        out = self.conv2(out)
        out = self.pool(out)
        out = self.relu(out)
        out = self.norm1(out)
        #Conv3
        out = self.conv3(out)
        out = self.drop(out)
        #Conv4
        out = self.conv4(out)
        out = F.dropout(out, training=self.training)
        out = out.view(-1, 32 * 32 * 32)
        out = self.fc(out)


        return out

编辑：我的自定义损失函数是否有帮助：

criterion = nn.CrossEntropyLoss()

def my_loss(outputs, labels):

    """
    Args:
        outputs - output of network ([batch size, 3]) 
        labels  - desired labels  ([batch size, 3])
    """

    loss = torch.zeros(1, dtype=torch.float, requires_grad=True)
    loss = loss.to(device)

    loss = criterion(outputs, labels)

    #----------------------------------------------
    # implementation needed here 
    #----------------------------------------------

    # Observe: If you need to iterate and add certain values to loss defined above
    # you cannot write: loss +=... because this will raise the error: 
    # "Leaf variable was used in an inplace operation"
    # Instead, to avoid this error write: loss = loss + ...       

    return loss

火车装载机（给我）：

train_dir      = "./train/"
validation_dir = "./validation/"
test_dir       = "./test/"


train_dataset = ShapesDataset(train_dir)

train_loader = DataLoader(train_dataset, 
                          batch_size=32,
                          shuffle=True)



validation_dataset = ShapesDataset(validation_dir)

validation_loader = DataLoader(validation_dataset, 
                               batch_size=1,
                               shuffle=False)



test_dataset = ShapesDataset(test_dir)

test_loader = DataLoader(test_dataset, 
                          batch_size=1,
                          shuffle=False)


print("train loader examples     :", len(train_dataset)) 
print("validation loader examples:", len(validation_dataset))
print("test loader examples      :", len(test_dataset))

编辑：还给出了此视图图像、目标圆标签和网络输出：

"""
View first image of a given number of batches assuming that model has been created. 
Currently, lines assuming model has been creatd, are commented out. Without a model, 
you can view target labels and the corresponding images.
This is given to you so that you may see how loaders and model can be used. 
"""

loader = train_loader # choose from which loader to show images
bacthes_to_show = 2
with torch.no_grad():
    for i, data in enumerate(loader, 0): #0 means that counting starts at zero
        inputs = (data['image']).to(device)   # has shape (batch_size, 3, 128, 128)
        labels = (data['labels']).to(device)  # has shape (batch_size, 3)
        img_fnames = data['fname']            # list of length batch_size

        #outputs = model(inputs.float())
        img = Image.open(img_fnames[0])

        print ("showing image: ", img_fnames[0])

        labels_str = [ float(("{0:.2f}".format(x))) for x in labels[0]]#labels_np_arr]

        #outputs_np_arr = outputs[0] # using ".numpy()" to convert tensor to numpy array
        #outputs_str = [ float(("{0:.2f}".format(x))) for x in outputs_np_arr]
        print("Target labels :", labels_str )
        #print("network coeffs:", outputs_str)
        print()
        #img.show()

        if (i+1) == bacthes_to_show:
            break

这是我得到的输出，它应该覆盖整个圆圈： Output I am getting 任何想法都会有所帮助。

【问题讨论】：

从堆栈跟踪来看，您的数据看起来应该是浮点数，而是双精度数。您可以尝试将data 设置为data['image'].to(device).float()，并对target 做同样的事情吗？
@kimbo 我确实尝试过，然后我得到了同样的错误，除了它需要一个 Long 类型但发现 float。 RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward
尝试脱掉目标上的.float()
@kimbo 不，不幸的是得到了同样的错误，关于预期的 Long 但得到了 Float 类型
您在运行 model(data 时引发了原始错误。）你的模型是什么样的？

标签： python runtime-error pytorch conv-neural-network training-data

【解决方案1】：

我基本上添加了（在 validate_model 和 fit 函数中）：

 _, target= torch.max(target.data, 1)

在_, preds = torch.max(output.data, 1) 代码行下方获取数据和目标的长度相同。还将损失函数从CrossEntropyLoss更改为MSELoss

然后在相同的函数中：我将以下行 output = model(data) 更改为 output = model(data.float())

【讨论】：

@kimbo 是的。但是我还有一个问题，那就是我仍然没有得到想要的输出。它确实绘制了圆圈，但不在现有圆圈之上。我认为这是 my_ 损失函数。我在我的问题中添加了输出图像