【发布时间】:2019-06-13 21:45:54
【问题描述】:
我的代码在 iPython 终端中运行时运行良好,但由于内存不足错误而失败,如下所示。
/home/abigail/anaconda3/envs/tf_gpuenv/bin/python -Xms1280m -Xmx4g /home/abigail/PycharmProjects/MLNN/src/test.py
Using TensorFlow backend.
Epoch 1/150
2019-01-19 22:12:39.539156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-19 22:12:39.588899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-19 22:12:39.589541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 59.69MiB
2019-01-19 22:12:39.589552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
Traceback (most recent call last):
File "/home/abigail/PycharmProjects/MLNN/src/test.py", line 20, in <module>
model.fit(X, Y, epochs=150, batch_size=10)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2697, in __call__
if hasattr(get_session(), '_make_callable_from_options'):
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
_SESSION = tf.Session(config=config)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Process finished with exit code 1
在 PyCharm 中,我首先编辑了“Help->Edit Custom VM options”:
-Xms1280m
-Xmx4g
这并不能解决问题。然后我编辑了“运行->编辑配置->解释器选项”:
-Xms1280m -Xmx4g
它仍然给出同样的错误。我的桌面 Linux 有足够的内存(64G)。如何解决这个问题?
顺便说一句,在 PyCharm 中,如果我不使用 GPU,它不会给出错误。
编辑:
In [5]: exit
(tf_gpuenv) abigail@abigail-XPS-8910:~/nlp/MLMastery/DLwithPython/code/chapter_07$ nvidia-smi
Sun Jan 20 00:41:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.25 Driver Version: 415.25 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti Off | 00000000:01:00.0 On | N/A |
| 38% 54C P0 2W / 38W | 1707MiB / 1993MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 770 G /usr/bin/akonadi_archivemail_agent 2MiB |
| 0 772 G /usr/bin/akonadi_sendlater_agent 2MiB |
| 0 774 G /usr/bin/akonadi_mailfilter_agent 2MiB |
| 0 1088 G /usr/lib/xorg/Xorg 166MiB |
| 0 1440 G kwin_x11 60MiB |
| 0 1446 G /usr/bin/krunner 1MiB |
| 0 1449 G /usr/bin/plasmashell 60MiB |
| 0 1665 G ...quest-channel-token=3687002912233960986 137MiB |
| 0 20728 C ...ail/anaconda3/envs/tf_gpuenv/bin/python 1255MiB |
+-----------------------------------------------------------------------------+
【问题讨论】:
-
我可以在那里看到的重要提示 - 在您的错误中,大部分(几乎所有)GPU 内存已经用完。当我在运行 GPU 的 Tensorflow 示例程序上检查相同内容时,我看到:2019-01-20 16:38:00.550256: IT:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392 ] 找到具有属性的设备 0:名称:GeForce GTX 1080 Ti 主要:6 次要:1 memoryClockRate(GHz):1.607 pciBusID:0000:01:00.0 totalMemory:11.00GiB freeMemory:9.10GiB 您需要比 59MB 更多的内存。检查是什么耗尽了你所有的内存并相应地重新分配
-
但是如你所见,我已经通过“python -Xms1280m -Xmx4g”来增加解释器可用的内存。
-
@SteveCarter,我的桌面有 64G 内存。您是否配置了一些东西来获得 11GB 的总内存?在我的例子中,这里显示的总内存是 1.95GiB。你在 PyCharm 中开发吗?
-
你需要考虑 GPU 的内存,在这种情况下它只有 2G 左右。您的设置是配置桌面内存,而不是 GPU 内存。你在什么平台上运行?它是某种Linux吗?如果是 Linux,您能否提供在命令行上运行“nvidid-smi”的输出?是的,我是在pycharm中开发的,但是我不需要像你以前那样分配内存。
-
2GB 的 GPU RAM 并不多,该模型可能需要比您拥有的更多的内存。您似乎认为这是 PyCharm 的问题,但事实并非如此。
标签: python tensorflow keras pycharm