【问题标题】:aws ec2 tensorflow gpu not workaws ec2 tensorflow gpu 不工作
【发布时间】:2018-08-25 20:44:14
【问题描述】:

我有一个带有 AMI 的 aws EC2 (p2.xlarge)

深度学习 AMI (Ubuntu) 版本 5.0 - ami-7336d50e

预装最新的深度学习框架二进制文件 在不同的虚拟环境中:MXNet、TensorFlow、Caffe、Caffe2、 PyTorch、Keras、Chainer、Theano 和 CNTK。完全配置了 NVidia CUDA、cuDNN 和 NCCL

我尝试在启动我的程序时使用 keras 制作 rnn 我有这个

 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.7.5 locally
 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.7.5 locally
 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
 I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.7.5 locally

当 karas 开始后我有了这个

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.10GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 12639 get requests, put_count=6277 evicted_count=1000 eviction_rate=0.159312 and unsatisfied allocation rate=0.590395
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110

但是当 de program learn 不快时,我的 macbookpro 比我的 EC2 快,并且在每个 epochs 之后我都会收到此警告

tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4156 get requests, put_count=8233 evicted_count=4000 eviction_rate=0.48585 and unsatisfied allocation rate=0.000481232

我已经安装了 karas_gpu 和 tensorflow_gpu,并且我将 vm 用于带有 tensorflow 的 keras2

如果我做错了什么,你可以告诉我什么,这样一个简单的小 macbook 就可以比 EC2 更快地使用这个规范

p2.xlarge(11.75 ECU、4 vCPU、2.7 GHz、E5-2686v4、61 Gio mémoire、EBS 唯一性)

【问题讨论】:

    标签: tensorflow amazon-ec2 keras


    【解决方案1】:

    回答很简单。在 EC2 AMI (p2.xlarge) 中,gpu 是 Tesla K80,在 TensorFlow 中这个 gpu 加速 4x ~ 10x cpu,在我的 macbook 中我有 8 个 cpu。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-05-19
      • 2019-07-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-09-26
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多