【发布时间】:2017-04-25 00:46:57
【问题描述】:
我的目标是在没有 CUDA 的机器上将 H2O 与 TensorFlow 集成。
由于 TensorFlow 支持 CPU 和 GPU 执行,我希望 H2O/TensorFlow 集成在没有 CUDA 的情况下是可能的。但是在system specifications of Deep Water 中提到 CUDA 软件让我很困惑。
我尝试在 H2O Flow 中构建 Deep Water/TensorFlow 模型,但失败了。我执行的步骤:
- 已下载H2O standalone JAR;
- 照常在 H2O Flow 中创建数据框;
- 尝试使用 Deep Water 和 tensorflow 分别作为算法和后端来构建模型;
- 出现以下异常:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model.
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
所以我的问题是:
- 是否可以在没有 CUDA 的情况下在 H2O 中构建 Deep Water/TensorFlow 模型?
- 如果是,我应该怎么做才能让它工作?如果不是,是否有其他选项可以在不使用 CUDA 的情况下集成 H2O 和 TensorFlow?
更新 1:
我已将 gpu 参数设置为 false,并尝试使用所有可能的后端再次构建模型。 caffe 和 tensorflow 都产生与上图相同的堆栈跟踪。 mxnet 也失败了,但有两个不同的堆栈跟踪。
mxnet(第一次尝试构建模型):
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: null
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
mxnet(后续尝试):
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Could not initialize class deepwater.backends.mxnet.MXNetBackend$MXNetLoader
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
更新 2
环境:
- 软件:CentOS Linux 版本 7.3.1611(核心),Java HotSpot 64 位服务器 VM(内部版本 25.121-b13,混合模式);
- HW:虚拟机在 Xeon CPU E5-2620 v4 上运行,具有 4 个内核和 8 GB RAM。没有可用的物理 GPU,
lspci -vnn | grep VGA返回00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405] (prog-if 00 [VGA controller])
我已经清除了我的 /tmp 目录并再次尝试了 mxnet。在第一次尝试时,我遇到了新的异常:
java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: /tmp/libmxnet.so: libcudart.so.8.0: cannot open shared object file: No such file or directory
at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246)
at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193)
at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225)
at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127)
at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169)
at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1220)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
文件/tmp/libmxnet.so 存在,其权限为-rw-rw-r--。
【问题讨论】:
标签: h2o