【发布时间】:2023-03-18 20:25:02
【问题描述】:
我有一台配备 8 个 GPU(4 个 GPU GTX 1080 Ti 的 11 Gb de RAM 和 4 个 RTX 1080)的机器,并且无法让 tensorflow 正确(或根本)使用它们。
当我这样做时
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
打印出来
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5295519098812813462
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12186007115805339517
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17706271046686153881
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 14710290295129432533
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:3"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1381213064943868400
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:4"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12093982778662340719
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:5"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 682960671898108683
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:6"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9901240111105546679
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:7"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 8442134369143872649
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 1687638086072792879
physical_device_desc: "device: XLA_CPU device"
].
如果我尝试将 GPU 用于任何事情,nvidia-smi 会说它们已被占用,但以 0% 的速度运行,并且任务的速度显示 tensorflow 只是在使用 CPU。
在其他机器上,使用相同的设置,它也会打印 '/device:GPU:2' 和 '/device:XLA_GPU:2'(例如),并且 tensorflow 能够毫无问题地使用它们。
我已经看到了类似的问题和解决方案,但似乎都没有。
【问题讨论】:
标签: python tensorflow gpu