【问题标题】:GPU works slower than CPU in Tensorflow, why?在 Tensorflow 中 GPU 的工作速度比 CPU 慢,为什么?
【发布时间】:2019-07-10 07:05:39
【问题描述】:

以下代码:

import time
import tensorflow as tf

tf.enable_eager_execution()


def time_matmul(x):
    start = time.time()
    for loop in range(10):
        tf.matmul(x, x)

    result = time.time() - start

    print("10 loops: {:0.2f}ms".format(1000 * result))


# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
    x = tf.random_uniform([1000, 1000])
    assert x.device.endswith("CPU:0")
    time_matmul(x)

# Force execution on GPU #0 if available
print("On GPU:")
if tf.test.is_gpu_available():
    with tf.device("GPU:0"):  # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
        x = tf.random_uniform([1000, 1000])
        assert x.device.endswith("GPU:0")
        time_matmul(x)

给出以下输出:

On CPU:
2019-02-16 11:12:33.724828: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-02-16 11:12:34.056651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.63GiB
2019-02-16 11:12:34.056984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-16 11:12:34.501349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-16 11:12:34.501515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-16 11:12:34.501612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-16 11:12:34.501855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6384 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
10 loops: 206.00ms
On GPU:
2019-02-16 11:12:34.718164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-16 11:12:34.718377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-16 11:12:34.718540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-16 11:12:34.718641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-16 11:12:34.718832: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6384 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
10 loops: 239.52ms

CPU 是 Core i7,GPU 是 GTX 1070

发生了什么?

【问题讨论】:

    标签: tensorflow gpu cpu


    【解决方案1】:

    在具有 100 个循环的同一台 (Windows) 机器上:

    On CPU:
    100 loops: 2100.54ms
    On GPU:
    100 loops: 229.00ms
    

    在具有Core i5 的 Linux 机器上,相同的 GPU 型号和更好的 RAM,10 个循环:

    On CPU:
    10 loops: 112.44ms
    On GPU:
    10 loops: 113.44ms
    

    在具有 100 个循环的同一台 Linux 机器上:

    On CPU:
    100 loops: 1120.60ms
    On GPU:
    100 loops: 107.55ms
    

    所以,问题在于内部数据传输

    【讨论】:

    • 请您详细解释一下“内部数据传输”问题吗?我也有同样的问题。谢谢。
    猜你喜欢
    • 1970-01-01
    • 2019-04-18
    • 2020-05-13
    • 2015-12-27
    • 2016-10-23
    • 2019-11-06
    • 2012-08-17
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多