【问题标题】:CUDA out of memory in Google ColabGoogle Colab 中的 CUDA 内存不足
【发布时间】:2020-11-16 16:19:53
【问题描述】:

我正在尝试复制 GAN 研究 (Stargan-V2)。所以,我想在 Google Colab 中训练一个模型(使用更少的数据)。但是,我遇到了这个问题:

Start training...
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
  File "main.py", line 182, in <module>
    main(args)
  File "main.py", line 59, in main
    solver.train(loaders)
  File "/content/drive/My Drive/stargan-v2/core/solver.py", line 131, in train
    nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks)
  File "/content/drive/My Drive/stargan-v2/core/solver.py", line 259, in compute_g_loss
    x_rec = nets.generator(x_fake, s_org, masks=masks)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 181, in forward
    x = block(x, s)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 117, in forward
    out = self._residual(x, s)
  File "/content/drive/My Drive/stargan-v2/core/model.py", line 109, in _residual
    x = F.interpolate(x, scale_factor=2, mode='nearest')
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 3132, in interpolate
    return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 14.73 GiB already allocated; 195.88 MiB free; 14.89 GiB reserved in total by PyTorch)

我更改了batch_size,但它对我不起作用。你有什么想法吗?我该如何解决这个问题?

谢谢。

【问题讨论】:

  • 你想复制什么 GAN 研究?如果你给我名字,我可以修改我的答案以具体解决它。

标签: python google-colaboratory


【解决方案1】:

如果您没有使用专业版的 Google Colab,那么您的内存分配将会遇到一些限制性的最大值。来自Google Colab FAQ...

Colab 虚拟机中的可用内存量随时间而变化(但在 VM 的生命周期内保持稳定)...当 Colab 检测到您可能需要额外内存时,有时可能会自动为您分配具有额外内存的 VM .有兴趣在 Colab 中为他们提供更多可用内存且更可靠的用户可能会对 Colab Pro 感兴趣。

您已经很好地掌握了这个问题,因为您知道降低batch_size 是暂时解决这个问题的好方法。不过,最终,如果您想复制这项研究,您必须改用一种能够适应您似乎需要的数据量的训练方法。

【讨论】:

    猜你喜欢
    • 2020-03-21
    • 2021-09-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-03-31
    • 2020-05-04
    • 2021-12-16
    • 2018-07-06
    相关资源
    最近更新 更多