为什么我在这里得到除以零的错误？答案

【问题标题】：Why am I getting a divide by zero error here?为什么我在这里得到除以零的错误？
【发布时间】：2020-10-14 23:14:52
【问题描述】：

所以我正在关注自定义数据集上的 tutorial in the docs。我使用的是 MNIST 数据集，而不是教程中花哨的数据集。这是我写的Dataset类的扩展：

class KaggleMNIST(Dataset):

    def __init__(self, csv_file, transform=None):
        self.pixel_frame = pd.read_csv(csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.pixel_frame)

    def __getitem__(self, index):
        if torch.is_tensor(index):
            index = index.tolist()

        image = self.pixel_frame.iloc[index, 1:]
        image = np.array([image])

        if self.transform:
            image = self.transform(image)

        return image

它可以工作，直到我尝试对其使用转换：

tsf = transforms.Compose([transforms.ToTensor(), 
                          transforms.Normalize((0.5,), (0.5,))
                         ])
                          
trainset = KaggleMNIST('train/train.csv', transform=tsf)

image0 = trainset[0]

我查看了堆栈跟踪，这行代码中似乎正在发生规范化：

c:\program files\python38\lib\site-packages\torchvision\transforms\functional.py in normalize(tensor, mean, std, inplace)
--> 218     tensor.sub_(mean[:, None, None]).div_(std[:, None, None])

所以我不明白为什么要除以零，因为std 应该是 0.5，远不接近一个小值。

感谢您的帮助！

编辑：

这并没有回答我的问题，但我发现如果我更改这些代码行：

image = self.pixel_frame.iloc[index, 1:] 
image = np.array([image])

到

image = self.pixel_frame.iloc[index, 1:].to_numpy(dtype='float64').reshape(1, -1)

基本上，确保数据类型为float64 解决了问题。我仍然不确定为什么问题首先存在，所以我仍然很高兴得到一个解释清楚的答案！

【问题讨论】：

标签： python pytorch

【解决方案1】：

读取数据的dtype为int64

img = np.array([pixel_frame.iloc[0, 1:]])
img.dtype
# output
dtype('int64')

这会强制将均值和标准差转换为int64，并且标准标准为 0.5，它变为 0，并引发以下错误：

>>> tsf(img)
ValueError: std evaluated to zero after conversion to torch.int64, leading to division by zero.

这是因为在归一化过程中均值和标准差被转换为数据集的dtype。

def normalize(tensor, mean, std, inplace=False):
    ...
    dtype = tensor.dtype
    mean = torch.as_tensor(mean, dtype=dtype, device=tensor.device)
    std = torch.as_tensor(std, dtype=dtype, device=tensor.device)
    if (std == 0).any():
        raise ValueError('std evaluated to zero after conversion to {}, leading to division by zero.'.format(dtype))

这就是为什么将 dtype 转换为 float 可以修复错误的原因。

【讨论】：