【问题标题】:Distributed Tensorflow Tensorforest分布式张量流 Tensorforest
【发布时间】:2018-11-05 00:09:53
【问题描述】:

请问我是分布式处理的新手,我想知道如何使用分布式张量森林训练张量森林模型?我了解它是如何为神经网络完成的,但我不了解 tensorforest,它是使用 tensorflow 框架实现的随机森林

【问题讨论】:

    标签: tensorflow machine-learning parallel-processing distributed-computing random-forest


    【解决方案1】:

    我最近深入研究了这个话题。由于TensorForestEstimator是从tf.contrib.learn.Estimator派生的,所以应该可以在分布式训练环境中使用。

    我遇到的问题是如何正确配置设备分配。 TensorForestEstimator 的构造函数采用 device_assigner 关键字参数。

    device_assigner: An object instance that controls how trees get assigned to devices. If None, will use tensor_forest.RandomForestDeviceAssigner.

    文档不准确。默认其实是tf.contrib.framework.VariableDeviceChooser的一个实例。

    https://github.com/tensorflow/tensorflow/blob/v1.12.0/tensorflow/contrib/tensor_forest/python/tensor_forest.py#L380

    代码实例化了不带参数的VariableDeviceChooser,它应该在不带参数服务器的情况下运行。这在单机环境中很好,但在分布式环境中则不然。我已经尝试传递 VariableDeviceChooser 的值,该值使用从 TF_CONFIG 中的数据推断的参数服务器的数量进行实例化。

    这是我在训练操作期间启动会话时观察到的错误消息。

      File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
        return fn(*args)
      File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
        self._extend_graph()
      File "/home/ubuntu/.pyenv/versions/cmle-1_12-py-3_5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
        tf_session.ExtendSession(self._session)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation device_dummy_0/Initializer/random_uniform/RandomUniform: Could not satisfy explicit device specification '' because the node {{colocation_node device_dummy_0/Initializer/random_uniform/RandomUniform}} was colocated with a group of nodes that required incompatible device '/job:ps/task:0/device:CPU:0'
    Colocation Debug Info:
    Colocation group had the following types and devices: 
    IsVariableInitialized: CPU 
    Assign: CPU 
    Identity: CPU XLA_CPU 
    VariableV2: CPU  
    Mul: CPU XLA_CPU 
    Add: CPU XLA_CPU 
    Sub: CPU XLA_CPU 
    RandomUniform: CPU XLA_CPU 
    Const: CPU XLA_CPU 
    
    Colocation members and user-requested devices:
      device_dummy_0/Initializer/random_uniform/shape (Const) 
      device_dummy_0/Initializer/random_uniform/min (Const) 
      device_dummy_0/Initializer/random_uniform/max (Const) 
      device_dummy_0/Initializer/random_uniform/RandomUniform (RandomUniform) 
      device_dummy_0/Initializer/random_uniform/sub (Sub) 
      device_dummy_0/Initializer/random_uniform/mul (Mul) 
      device_dummy_0/Initializer/random_uniform (Add) 
      device_dummy_0 (VariableV2) /job:ps/task:0/device:CPU:0   
      device_dummy_0/Assign (Assign) /job:ps/task:0/device:CPU:0
      device_dummy_0/read (Identity) /job:ps/task:0/device:CPU:0
      report_uninitialized_variables/IsVariableInitialized_1 (IsVariableInitialized) /job:ps/task:0/device:CPU:0  
      report_uninitialized_variables_1/IsVariableInitialized_1 (IsVariableInitialized) /job:ps/task:0/device:CPU:0
      save/Assign_1 (Assign) /job:ps/task:0/device:CPU:0
    
         [[{{node device_dummy_0/Initializer/random_uniform/RandomUniform}} = RandomUniform[T=DT_INT32, _class=["loc:@device_dummy_0"], dtype=DT_FLOAT, seed=0, seed2=0](device_dummy_0/Initializer/random_uniform/shape)]]```
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-10-12
      • 1970-01-01
      • 1970-01-01
      • 2018-07-31
      • 1970-01-01
      相关资源
      最近更新 更多