【发布时间】:2017-05-08 15:58:21
【问题描述】:
这是从运行脚本以检查 Tensorflow 是否正常工作时收到的消息:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
我注意到它提到了 SSE4.2 和 AVX,
- 什么是 SSE4.2 和 AVX?
- 这些 SSE4.2 和 AVX 如何改进 Tensorflow 任务的 CPU 计算。
- 如何让Tensorflow使用这两个库编译?
【问题讨论】:
-
我喜欢在 Xeon E5 v3 上使用这些标志
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package构建,与官方版本相比,8k matmul CPU 速度提高了 3 倍(0.35 -> 1.05 T ops/sec) -
别忘了
NOTE on gcc 5 or later: the binary pip packages available on the TensorFlow website are built with gcc 4, which uses the older ABI. To make your build compatible with the older ABI, you need to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to your bazel build command. ABI compatibility allows custom ops built against the TensorFlow pip package to continue to work against your built package.从这里tensorflow.org/install/install_sources -
我有一些为 TF 编译的二进制文件,支持这些指令github.com/lakshayg/tensorflow-build。您可能会觉得这很有帮助。
-
@IvanKush 添加了该标志,我仍然无法成功导入 tensorflow(编译良好)。如果你用 gcc 5 编译成功,请看:stackoverflow.com/questions/45877158/…
-
如果使用 Ubuntu 16.04,我们在github.com/mind/wheels为您可能需要的几乎所有变体提供构建
标签: tensorflow x86 compiler-optimization simd compiler-options