【问题标题】:Place loaded frozen model on specific gpu device in Tensorflow将加载的冻结模型放置在 Tensorflow 中的特定 gpu 设备上
【发布时间】:2020-02-07 03:52:55
【问题描述】:

我有一个冻结模型和 4 个 gpus。我想尽可能快地对尽可能多的数据进行推理。我基本上想在同一模型对 4 个批次执行推理的情况下执行数据并行性:每个 gpu 一个批次。

这就是我大致想做的事情

def return_ops():
    # load the graph
    with tf.Graph().as_default() as graph:
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(model_path, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')

    inputs = []
    outputs = []
    with graph.as_default() as g:
        for gpu in ['/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3']:
            with tf.device(gpu):
                image_tensor = g.get_tensor_by_name('input:0')
                get_embeddings = g.get_tensor_by_name('embeddings:0')
            inputs.append(image_tensor)
            outputs.append(get_embeddings)

    return inputs, outputs, g

但是,当我跑步时

#sample batch
x = np.ones((100,160,160,3))
# get ops
image_tensor_list, pt_list, emb_list, graph = return_ops()
# construct feed dict
feed_dict = {it: x for it in image_tensor_list}

# run the ops
with tf.Session(graph=graph, config=tf.ConfigProto(allow_soft_placement=True)) as sess:
    inf = sess.run(emb_list, feed_dict=feed_dict)

使用 nvidia-smi 进行检查时,一切都在 /gpu:0 上运行。

不过,我可以运行

with tf.device("/gpu:1"):
    t = tf.range(1000)

with tf.Session() as sess:
    sess.run(t)

第二个gpu上有活动...

我怎样才能正确地实现这个数据并行任务?

【问题讨论】:

    标签: python tensorflow gpu inference multiple-gpu


    【解决方案1】:

    我了解到在导入 graph_def 时需要在 GPU 上放置张量。下面的代码返回我可以使用sess.run([output1, ..., outputk], feed_dict) 运行的操作。它将所有操作都放在 gpu 上,这并不理想,因此我将 allow_soft_placement 传递给会话配置。

    class MultiGPUNet(object):
    
        def __init__(self, model_path, n_gpu):
    
            self.model_path = model_path
            self.n_gpu = n_gpu
            self.graph = tf.Graph()
    
            # specify device for n_gpu copies of model
            # during graphdef parsing
            for i in range(self.n_gpu):
                self._init_models(i, self.graph)
    
        def _init_models(self, i, graph):
    
            with self.graph.as_default():
                od_graph_def = tf.GraphDef()
    
                with tf.gfile.GFile(model_path, 'rb') as fid:
                    serialized_graph = fid.read()
                    od_graph_def.ParseFromString(serialized_graph)
    
                    with tf.device('/device:GPU:{}'.format(i)):
                        tf.import_graph_def(od_graph_def, name='{}'.format(i))
    
        def get_tensors(self):
    
            output_tensors = []
            input_tensors = []
            train_tensors = []
    
            for i in range(self.n_gpu):
                input_tensors.append(
                    self.graph.get_tensor_by_name('{}/<input_name>:0'.format(i)))
                output_tensors.append(
                    self.graph.get_tensor_by_name('{}/<out_name>:0'.format(i)))
                train_tensors.append(
                    self.graph.get_tensor_by_name('{}/<train_name>:0'.format(i)))
    
            def make_feed_dict(x):
                """x will be a list of batches"""
                assert len(x)==len(input_tensors)
                input_data = zip(input_tensors, x)
                train_bool = zip(train_tensors, [False]*len(train_tensors))
                return dict(input_data + train_bool)
    
            return output_tensors, make_feed_dict
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-11-20
      • 2019-01-04
      • 2019-04-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-07-17
      相关资源
      最近更新 更多