在 PyTorch 中使用 module.to() 移动成员张量答案

【问题标题】：Moving member tensors with module.to() in PyTorch在 PyTorch 中使用 module.to() 移动成员张量
【发布时间】：2019-07-09 09:49:12
【问题描述】：

我正在 PyTorch 中构建变分自动编码器 (VAE)，但在编写与设备无关的代码时遇到问题。 Autoencoder 是nn.Module 的子代，具有编码器和解码器网络，它们也是。通过调用net.to(device)，可以将网络的所有权重从一台设备转移到另一台设备。

我遇到的问题是重新参数化技巧：

encoding = mu + noise * sigma

噪声是与mu 和sigma 大小相同的张量，并保存为自动编码器模块的成员变量。它在构造函数中初始化，并在每个训练步骤就地重新采样。我这样做是为了避免每一步都构建一个新的噪声张量并将其推送到所需的设备。此外，我想修复评估中的噪音。代码如下：

class VariationalGenerator(nn.Module):
    def __init__(self, input_nc, output_nc):
        super(VariationalGenerator, self).__init__()

        self.input_nc = input_nc
        self.output_nc = output_nc
        embedding_size = 128

        self._train_noise = torch.randn(batch_size, embedding_size)
        self._eval_noise = torch.randn(1, embedding_size)
        self.noise = self._train_noise

        # Create encoder
        self.encoder = Encoder(input_nc, embedding_size)
        # Create decoder
        self.decoder = Decoder(output_nc, embedding_size)

    def train(self, mode=True):
        super(VariationalGenerator, self).train(mode)
        self.noise = self._train_noise

    def eval(self):
        super(VariationalGenerator, self).eval()
        self.noise = self._eval_noise

    def forward(self, inputs):
        # Calculate parameters of embedding space
        mu, log_sigma = self.encoder.forward(inputs)
        # Resample noise if training
        if self.training:
            self.noise.normal_()
        # Reparametrize noise to embedding space
        inputs = mu + self.noise * torch.exp(0.5 * log_sigma)
        # Decode to image
        inputs = self.decoder(inputs)

        return inputs, mu, log_sigma

当我现在使用 net.to('cuda:0') 将自动编码器移动到 GPU 时，由于噪声张量未移动，我在转发时遇到错误。

我不想在构造函数中添加设备参数，因为以后仍然无法将其移动到另一个设备。我还尝试将噪声包装到nn.Parameter 中，使其受net.to() 的影响，但这会导致优化器出错，因为噪声被标记为requires_grad=False。

任何人都可以使用net.to() 移动所有模块吗？

【问题讨论】：

标签： python deep-learning gpu pytorch autoencoder

【解决方案1】：

使用这个：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

现在对于你使用的模型和每个张量

net.to(device)
input = input.to(device)

【讨论】：

这不是问题。问题是当使用net.to(device) 时，噪声张量没有与权重一起移动。

【解决方案2】：

经过反复试验，我找到了两种方法：

使用缓冲区：通过将self._train_noise = torch.randn(batch_size, embedding_size) 替换为self.register_buffer('_train_noise', torch.randn(batch_size, embedding_size)，噪声张量作为缓冲区添加到模块中。这也让net.to(device) 影响它。此外，张量现在是 state_dict 的一部分。

覆盖net.to(device)：使用它，噪音不会出现在 state_dict 之外。

def to(device):
    new_self = super(VariationalGenerator, self).to(device)
    new_self._train_noise = new_self._train_noise.to(device)
    new_self._eval_noise = new_self._eval_noise.to(device)

    return new_self

【讨论】：

方法 2 通过覆盖 _apply 而不仅仅是 to 可能会更好；然后.cuda() 等也都可以工作。 (my answer)

【解决方案3】：

tilman151's second approach 的更好版本可能是覆盖_apply，而不是to。这样net.cuda()、net.float() 等都可以正常工作，因为它们都调用_apply 而不是to（可以在the source 中看到，这比你想象的要简单）：

def _apply(self, fn):
    super(VariationalGenerator, self)._apply(fn)
    self._train_noise = fn(self._train_noise)
    self._eval_noise = fn(self._eval_noise)
    return self

【讨论】：

_apply 方法似乎被开发人员故意保留为不覆盖。相反，我建议将其注册为带有 persistent=False arg 的缓冲区，因为这将使其远离 state_dict。这不是更稳定的方式吗？
@lamo_738 是的，我认为在给出这个答案时不存在非持久缓冲区，但这似乎是现在最好的选择。
我有一个要移动到 GPU 的顺序对象列表。这样做时，_apply 函数中出现以下错误：AttributeError: 'Sequential' object has no attribute 'is_floating_point'

【解决方案4】：

通过使用它，您可以将相同的参数应用于您的张量和模块

def to(self, **kwargs):
    module = super(VariationalGenerator, self).to(**kwargs)
    module._train_noise = self._train_noise.to(**kwargs)
    module._eval_noise = self._eval_noise.to(**kwargs)

    return module

【讨论】：

【解决方案5】：

您可以使用nn.Module 缓冲区和参数 - 调用.to(device) 时会考虑这两者并移至device。优化器正在更新参数（因此它们需要requires_grad=True），缓冲区不是。

所以在你的情况下，我会把构造函数写成：

    def __init__(self, input_nc, output_nc):
        super(VariationalGenerator, self).__init__()

        self.input_nc = input_nc
        self.output_nc = output_nc
        embedding_size = 128

        # --- CHANGED LINES ---
        self.register_buffer('_train_noise', torch.randn(batch_size, embedding_size))
        self.register_buffer('_eval_noise', torch.randn(1, embedding_size))
        # --- CHANGED LINES ---

        self.noise = self._train_noise

        # Create encoder
        self.encoder = Encoder(input_nc, embedding_size)
        # Create decoder
        self.decoder = Decoder(output_nc, embedding_size)

【讨论】：