如何加载图像并将其转换为适合 PyTorch 的张量？答案

【问题标题】：How do I load up an image and convert it to a proper tensor for PyTorch?如何加载图像并将其转换为适合 PyTorch 的张量？
【发布时间】：2018-10-29 10:01:05
【问题描述】：

我正在尝试自定义加载一些带有一些标签的图像文件（JPG 文件），并按照示例 here 将它们输入 PyTorch 中的卷积神经网络 (CNN)。但是，似乎还有no decent end-to-end tutorials。我看到的问题如下。

RuntimeError: thnn_conv2d_forward is not implemented for type
torch.ByteTensor

我的Dataset 如下所示。

class ImageData(Dataset):
    def __init__(self, width=256, height=256, transform=None):
        self.width = width
        self.height = height
        self.transform = transform
        y, x = get_images() #y is a list of labels, x is a list of file paths
        self.y = y
        self.x = x

    def __getitem__(self, index):
        img = Image.open(self.x[index]) # use pillow to open a file
        img = img.resize((self.width, self.height)) # resize the file to 256x256
        img = img.convert('RGB') #convert image to RGB channel
        if self.transform is not None:
            img = self.transform(img)

        img = np.asarray(img).transpose(-1, 0, 1) # we have to change the dimensions from width x height x channel (WHC) to channel x width x height (CWH)
        img = torch.from_numpy(np.asarray(img)) # create the image tensor
        label = torch.from_numpy(np.asarray(self.y[index]).reshape([1, 1])) # create the label tensor
        return img, label

    def __len__(self):
        return len(self.x)

CNN 取自here，修改后处理 NCWH（batch x channel x width x height）如下。

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 256, 256)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

学习循环也取自same tutorial，如下所示。

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(dataloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

但是，上面提到的RuntimeError 被抛出。关于我做错了什么有什么想法吗？

另外，我知道在不转置图像数据的情况下，它是 WHC 的形状，但 NN 模型要求它为 CWH。问题是，如果我们从 WHC 更改为 CWH，那么如果我们迭代 DataLoader，我们就不能再简单地绘制图像了。

data = ImageData()
dataloader = DataLoader(data, batch_size=10, shuffle=True, num_workers=1)
imgs, labels = next(iter(dataloader))
plt.imshow(imgs.numpy()[0,:,:,:])
plt.show()

尝试这样做会抛出以下错误。

TypeError: Invalid dimensions for image data

对我来说，Pillow 为您提供 WHC，您可以使用它进行绘图，但 PyTorch CNN 希望 CWH 处理，这是一个麻烦。关于如何始终如一地或轻松地不进行如此多的转换但能够将数据绘制并输入到 CNN 的任何想法？还是 WHC 与 CWH 的这种不匹配只是我们必须忍受的事情？

在不转置图像的情况下，将其馈送到 CNN 时，会引发以下错误。

RuntimeError: Given groups=1, weight[256, 3, 256, 256], so expected
input[10, 256, 256, 3] to have 3 channels, but got 256 channels

改为。

【问题讨论】：

标签： python image-processing pillow pytorch convolutional-neural-network

【解决方案1】：

conv2d 在浮点张量上运行。在将输入图像传递到神经网络之前对其进行归一化是一种常见且良好的做法。

我会在您将其转换为 __getitem__ 中的 Torch 张量之前立即添加行 img = img/255，然后它将转换为 float 张量而不是 byte 张量，因此将与conv2d 方法。

【讨论】：

这不起作用：TypeError: unsupported operand type(s) for /: 'Image' and 'float'
img = np.asarray(img)/255？您在转置线上这样做了，所以不知道为什么再次需要它，但似乎这就是问题所在。

【解决方案2】：

您的训练数据“输入”是 ByteTensor 类型，但 torch.conv2d() 仅支持 FloatTensor 和 DoubleTensor 的大多数操作。所以只需要添加这个：

inputs = inputs.type(torch.FloatTensor)

或

inputs = inputs.type(torch.DoubleTensor)

在你把它放入网络之前

【讨论】：