【发布时间】:2021-08-18 20:28:42
【问题描述】:
这是背景:
我对深度学习了解不多,也不是我编写代码的人。我按照某人的程序测试人工智能。我在 3 台不同的笔记本电脑上尝试相同的过程。我认为具有更好硬件的笔记本电脑会提高训练速度,但结果并非如此。
根据代码,似乎是使用带有 tensorflow 后端的 Keras。
我做了一些研究并试图加快这个过程:比如使用 GPU。但后来我发现两台笔记本电脑的 GPU 负载都在 0% 到 1% 之间。似乎两台笔记本电脑都没有使用 GPU。
所以我想,可能是tensorflow没有识别GPU,所以我尝试使用tersorflow-gpu,安装cuda和cudnn...
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2021-08-18 17:17:00.307495: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021-08-18 17:17:00.312631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-08-18 17:17:00.364157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.645GHz coreCount: 16 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2021-08-18 17:17:00.364352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-08-18 17:17:00.397938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021-08-18 17:17:00.427946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021-08-18 17:17:00.435072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021-08-18 17:17:00.478467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021-08-18 17:17:00.495200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021-08-18 17:17:00.559633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021-08-18 17:17:00.560557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-08-18 17:17:04.129809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-18 17:17:04.129968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2021-08-18 17:17:04.130734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2021-08-18 17:17:04.132802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 6788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2340425778646607054
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7118530151
locality {
bus_id: 1
links {
}
}
incarnation: 4718765836722936952
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
即使 tensorflow-gpu 似乎也能识别 GPU,但仍然没有变得更快,而没有 GPU 和旧 CPU 的笔记本电脑实际上更快。
新笔记本电脑的运行速度约为 1 it/s,但旧笔记本电脑的运行速度为 9 it/s。我还有一台更旧的笔记本电脑可以运行 5~6 it/s
现在要训练 14 GB 数据集,我估计用旧笔记本电脑需要 30 天,而新笔记本电脑可能需要 45 天。
困扰我的是:使用相同的代码和库,接下来会影响训练速度不是硬件吗?还是我有什么误解?
【问题讨论】:
-
您是否也对代码进行了更改以指定哪个部分将在 GPU 上运行?
标签: tensorflow artificial-intelligence hardware