【发布时间】:2021-09-11 13:36:59
【问题描述】:
目前我正在完成我的最后一年项目,该项目涉及开发一个多流 CNN 来执行动作识别。但是,最终输出依赖于独立流(空间和时间)生成的输出。我的目标是使推理过程尽可能高效,因此我希望使 2 个不同的流同时运行。默认情况下,它会顺序运行forward函数,因此执行时间会很长。
rgb = network1(input1)
of = network2(input2)
final_output = (rgb + of)/2
return final_output
我浏览了有关 PyTorch 多处理的一些信息,并尝试了一些使用 torch.multiprocessing.Process 的示例,但是执行时间似乎比我预期的要长。代码如下所示。
import torch
import torchvision
import torch.multiprocessing as mp
import time
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net1 = torchvision.models.quantization.mobilenet_v3_large(pretrained=True,quantize=False)
net2 = torchvision.models.quantization.mobilenet_v3_large(pretrained=True,quantize=False)
if __name__ == "__main__":
inputs = torch.rand(1, 3, 224, 224)
start = time.time()
outputs = net1.forward(inputs)
end = time.time()
print('Time taken for forward prop on 1 stream: (sequentially)',end-start)
start = time.time()
outputs = net1.forward(inputs)
outputs = net2.forward(inputs)
end = time.time()
print('Time taken for forward prop on 2 stream: (sequentially)',end-start)
p1 = mp.Process(target=net1.forward, args=(inputs,))
p2 = mp.Process(target=net2.forward, args=(inputs,))
start = time.time()
p1.start()
p2.start()
p1.join()
p2.join()
end = time.time()
print('Time taken for forward prop on 2 stream: (parallel)',end-start)
这是输出:
Time taken for forward prop on 1 stream: (sequentially) 0.08776640892028809
Time taken for forward prop on 2 stream: (sequentially) 0.15159368515014648
Time taken for forward prop on 2 stream: (parallel) 3.8684606552124023
可以看出,前向道具是按顺序执行的,请问如何使两个网络的前向传播同时执行?
【问题讨论】:
标签: python parallel-processing pytorch cuda multiprocessing