【问题标题】:Is it possible to build Deep Water/TensorFlow model in H2O without CUDA是否可以在没有 CUDA 的情况下在 H2O 中构建 Deep Water/TensorFlow 模型
【发布时间】:2017-04-25 00:46:57
【问题描述】:

我的目标是在没有 CUDA 的机器上将 H2O 与 TensorFlow 集成。

由于 TensorFlow 支持 CPU 和 GPU 执行,我希望 H2O/TensorFlow 集成在没有 CUDA 的情况下是可能的。但是在system specifications of Deep Water 中提到 CUDA 软件让我很困惑。

我尝试在 H2O Flow 中构建 Deep Water/TensorFlow 模型,但失败了。我执行的步骤:

  1. 已下载H2O standalone JAR;
  2. 照常在 H2O Flow 中创建数据框;
  3. 尝试使用 Deep Water 和 tensorflow 分别作为算法和后端来构建模型;
  4. 出现以下异常:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model.
    at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
    at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
    at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
    at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
    at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
    at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

所以我的问题是:

  1. 是否可以在没有 CUDA 的情况下在 H2O 中构建 Deep Water/TensorFlow 模型?
  2. 如果是,我应该怎么做才能让它工作?如果不是,是否有其他选项可以在不使用 CUDA 的情况下集成 H2O 和 TensorFlow?

更新 1

我已将 gpu 参数设置为 false,并尝试使用所有可能的后端再次构建模型。 caffe 和 tensorflow 都产生与上图相同的堆栈跟踪。 mxnet 也失败了,但有两个不同的堆栈跟踪。

mxnet(第一次尝试构建模型):

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
    at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
    at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
    at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
    at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
    at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
    at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

mxnet(后续尝试):

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader
    at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
    at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
    at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
    at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
    at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
    at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

更新 2

环境:

  • 软件:CentOS Linux 版本 7.3.1611(核心),Java HotSpot 64 位服务器 VM(内部版本 25.121-b13,混合模式);
  • HW:虚拟机在 Xeon CPU E5-2620 v4 上运行,具有 4 个内核和 8 GB RAM。没有可用的物理 GPU,lspci -vnn | grep VGA 返回00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405] (prog-if 00 [VGA controller])

我已经清除了我的 /tmp 目录并再次尝试了 mxnet。在第一次尝试时,我遇到了新的异常:

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: /tmp/libmxnet.so: libcudart.so.8.0: cannot open shared object file: No such file or directory
    at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
    at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
    at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
    at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
    at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
    at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
    at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

文件/tmp/libmxnet.so 存在,其权限为-rw-rw-r--

【问题讨论】:

    标签: h2o


    【解决方案1】:

    第一个问题的答案如下:

    你肯定可以在没有 GPU 的情况下运行深水,它会很慢。当您使用 FLOW 时,您可以禁用 gpu 设置,如下所示(默认为 TRUE)

    您还可以在 FLOW 单元格中将 gpu 设置为 false,如下所示:

    “gpu”:假

    但是,您的主要问题是没有任何后端(mxnet、tensorflow、caffe)可用于运行您的代码。我们确实使用 mxnet 测试了 gpu 标志设置。请尝试对上述错误进行更多调查。

    【讨论】:

    • 请提供您的操作系统、GPU 和环境详细信息,
    • 有什么建议吗?
    • 构建最新的 Deep Water 代码并禁用 GPU 标志现在可与 Tensorflow 一起使用,因此您可以在没有 CUDA 的情况下使用 TF 后端。
    猜你喜欢
    • 1970-01-01
    • 2018-01-30
    • 1970-01-01
    • 1970-01-01
    • 2016-07-27
    • 1970-01-01
    • 2018-01-20
    • 2013-08-21
    • 1970-01-01
    相关资源
    最近更新 更多