为什么 Python 的 cProfile 报告的经过时间与使用 PyTorch 时使用 time.time() 增量不同？答案

【问题标题】：Why does Python's cProfile report a different elapsed time than using time.time() deltas when using PyTorch?为什么 Python 的 cProfile 报告的经过时间与使用 PyTorch 时使用 time.time() 增量不同？
【发布时间】：2021-02-25 11:24:03
【问题描述】：

我正在使用 PyTorch 分析一些代码。我知道 CUDA 通常有一些异步执行（请参阅PyTorch docs），但我相信从 GPU 传输到 CPU 通常会强制同步。

出于这个原因，我决定天真地使用cProfile 进行分析，但我注意到Profile.enable() ... Profile.disable() 报告的时间与time.time() 记录的时间不同（作为增量）。

以下是高级代码的外观：

gpu = torch.device("cuda")
cpu = torch.device("cpu")

setup = Setup()
net = make_fcn_resnet50(num_classes=setup.D)
net.eval().to(gpu)

rgb_tensor = setup.sample(device=cpu)

pr = profile.Profile()
pr.enable()
t_start = time.time()

rgb_tensor = rgb_tensor.to(gpu)
y = net(rgb_tensor)
dd_tensor = y["out"]
dd_mean = torch.mean(dd_tensor[[0]]).to(cpu).numpy()
assert dd_mean is not None

dt = time.time() - t_start
pr.disable()
stats = pstats.Stats(pr)

stats.print_stats(5)
print(f"dt: {dt:.4f}s")

这是我看到的差异：

2925 function calls (2734 primitive calls) in 0.009 seconds
...
dt: 0.0355s

我希望 cProfile 报告大约 35 毫秒（与 dt 相同），但它报告了大约 10 毫秒。

为什么会这样？

完整代码 + 复制在这里：
https://github.com/EricCousineau-TRI/repro/tree/bdef8a14/python/cprofile_with_torch

【问题讨论】：

标签： python pytorch python-3.6 ubuntu-18.04

【解决方案1】：

根据经验，如果您不“刷新”所有输出，或者如果您的代码没有完全包装在函数中，cProfile 似乎不会“挂钩”到代码中。

在此处查看 cmets 的更多详细信息：
https://github.com/EricCousineau-TRI/repro/blob/bdef8a14b5/python/cprofile_with_torch/repro.py#L75-L94

所有计时结果记录在：

Ubuntu 18.04
CPython 3.6.9
nvidia-driver-450 (450.102.04-0ubuntu0.18.04.1)
英伟达 Titan RTX

综上所述，使用 PyTorch 提供的机制（:facepalm:) 可能会更好：

【讨论】：