Google Colab 中的 CUDA 内存不足答案

【问题标题】：CUDA out of memory in Google ColabGoogle Colab 中的 CUDA 内存不足
【发布时间】：2020-11-16 16:19:53
【问题描述】：

我正在尝试复制 GAN 研究 (Stargan-V2)。所以，我想在 Google Colab 中训练一个模型（使用更少的数据）。但是，我遇到了这个问题：

Start training...
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
  File "main.py", line 182, in <module>
    main(args)
  File "main.py", line 59, in main
    solver.train(loaders)
  File "/content/drive/My Drive/stargan-v2/core/solver.py", line 131, in train
    nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks)
  File "/content/drive/My Drive/stargan-v2/core/solver.py", line 259, in compute_g_loss
    x_rec = nets.generator(x_fake, s_org, masks=masks)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 181, in forward
    x = block(x, s)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 117, in forward
    out = self._residual(x, s)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 109, in _residual
    x = F.interpolate(x, scale_factor=2, mode='nearest')
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 3132, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 14.73 GiB already allocated; 195.88 MiB free; 14.89 GiB reserved in total by PyTorch)

我更改了batch_size，但它对我不起作用。你有什么想法吗？我该如何解决这个问题？

谢谢。

【问题讨论】：

你想复制什么 GAN 研究？如果你给我名字，我可以修改我的答案以具体解决它。

标签： python google-colaboratory

【解决方案1】：

如果您没有使用专业版的 Google Colab，那么您的内存分配将会遇到一些限制性的最大值。来自Google Colab FAQ...

Colab 虚拟机中的可用内存量随时间而变化（但在 VM 的生命周期内保持稳定）...当 Colab 检测到您可能需要额外内存时，有时可能会自动为您分配具有额外内存的 VM .有兴趣在 Colab 中为他们提供更多可用内存且更可靠的用户可能会对 Colab Pro 感兴趣。

您已经很好地掌握了这个问题，因为您知道降低batch_size 是暂时解决这个问题的好方法。不过，最终，如果您想复制这项研究，您必须改用一种能够适应您似乎需要的数据量的训练方法。

【讨论】：