【问题标题】:Same code, same library, but why my training runs slower in a new laptop compare to an old laptop相同的代码,相同的库,但为什么我的训练在新笔记本电脑上的运行速度比旧笔记本电脑慢
【发布时间】:2021-08-18 20:28:42
【问题描述】:

这是背景:

我对深度学习了解不多,也不是我编写代码的人。我按照某人的程序测试人工智能。我在 3 台不同的笔记本电脑上尝试相同的过程。我认为具有更好硬件的笔记本电脑会提高训练速度,但结果并非如此。

根据代码,似乎是使用带有 tensorflow 后端的 Keras。

我做了一些研究并试图加快这个过程:比如使用 GPU。但后来我发现两台笔记本电脑的 GPU 负载都在 0% 到 1% 之间。似乎两台笔记本电脑都没有使用 GPU。

所以我想,可能是tensorflow没有识别GPU,所以我尝试使用tersorflow-gpu,安装cuda和cudnn...

>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2021-08-18 17:17:00.307495: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2021-08-18 17:17:00.312631: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2021-08-18 17:17:00.364157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 1070 computeCapability: 6.1
coreClock: 1.645GHz coreCount: 16 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 238.66GiB/s
2021-08-18 17:17:00.364352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2021-08-18 17:17:00.397938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2021-08-18 17:17:00.427946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2021-08-18 17:17:00.435072: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2021-08-18 17:17:00.478467: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2021-08-18 17:17:00.495200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2021-08-18 17:17:00.559633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021-08-18 17:17:00.560557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2021-08-18 17:17:04.129809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-18 17:17:04.129968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2021-08-18 17:17:04.130734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2021-08-18 17:17:04.132802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 6788 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 2340425778646607054
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7118530151
locality {
  bus_id: 1
  links {
  }
}
incarnation: 4718765836722936952
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

即使 tensorflow-gpu 似乎也能识别 GPU,但仍然没有变得更快,而没有 GPU 和旧 CPU 的笔记本电脑实际上更快。

新笔记本电脑的运行速度约为 1 it/s,但旧笔记本电脑的运行速度为 9 it/s。我还有一台更旧的笔记本电脑可以运行 5~6 it/s

现在要训练 14 GB 数据集,我估计用旧笔记本电脑需要 30 天,而新笔记本电脑可能需要 45 天。

困扰我的是:使用相同的代码和库,接下来会影响训练速度不是硬件吗?还是我有什么误解?

【问题讨论】:

  • 您是否也对代码进行了更改以指定哪个部分将在 GPU 上运行?

标签: tensorflow artificial-intelligence hardware


【解决方案1】:

如果您希望特定操作在您选择的设备上运行,而不是自动为您选择的设备,您可以使用 tf.device 创建设备上下文,该上下文中的所有操作都将在相同的指定设备。

import tensorflow as tf
tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Run on the GPU
c = tf.matmul(a, b)
print(c)

【讨论】:

    猜你喜欢
    • 2020-05-13
    • 2011-01-18
    • 2021-03-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-09-27
    • 1970-01-01
    相关资源
    最近更新 更多