自定义数据集类的 Pytorch 问题答案

【问题标题】：Pytorch Problem with Custom Dataset Class自定义数据集类的 Pytorch 问题
【发布时间】：2020-10-20 12:12:17
【问题描述】：

首先，我创建了一个自定义数据集以从我的数据帧中加载图像（包含图像文件路径和相应的 int 标签）：

class Dataset(torch.utils.data.Dataset):

    def __init__(self, dataframe, transform=None):
        self.frame = dataframe
        self.transform = transform

    def __len__(self):
        return len(self.frame)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        filename = self.frame.iloc[idx, 0]
        image = torch.from_numpy(io.imread(filename).transpose((2, 0, 1))).float()
        label = self.frame.iloc[idx, 1]
        sample = {'image': image, 'label': label}
        if self.transform:
            sample = self.transform(sample)
        return sample

然后，我使用预先存在的模型架构，如下所示：

model = models.densenet161()
num_ftrs = model.classifier.in_features
model.classifier = nn.Linear(num_ftrs, 10)  # where 10 is my number of classes

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

最后，为了训练，我做了以下事情：

model.train()  # switch to train mode
        
for epoch in range(5):
    for i, sample in enumerate(train_set):  # where train_set is an instance of my Dataset class
        optimizer.zero_grad()
        image, label = sample['image'].unsqueeze(0), torch.Tensor(sample['label']).long()
        output = model(image)

        loss = criterion(output, label)
        loss.backward()
        optimizer.step()

但是，loss = criterion(output, label) 出现错误。它告诉我ValueError: Expected input batch_size (1) to match target batch_size (2).。有人可以教我如何正确使用自定义数据集，尤其是批量加载数据吗？另外，为什么我会遇到 ValueError？谢谢！

【问题讨论】：

如何从 Dataset 类中构造 train_set？
@Shawn Zhang 不要在你的 getitem 的末尾返回一个字典，就像这样image_tens = self.transforms(image) return image_tens, torch.tensor(labels)

标签： pytorch

【解决方案1】：

请检查以下几行：

label = self.frame.iloc[idx, 1]在数据集定义中，你可以打印这个来重新检查，这个是不是返回两个int

image, label = sample['image'].unsqueeze(0), torch.Tensor(sample['label']).long()在训练代码中，需要检查张量的形状

【讨论】：