使用opencv-python和yolov3循环net.forward（outputLayers）的结果时如何提高性能答案

【问题标题】：How can I improve performance when looping through results from net.forward(outputLayers) using opencv-python and yolov3使用opencv-python和yolov3循环net.forward（outputLayers）的结果时如何提高性能
【发布时间】：2021-09-23 09:49:59
【问题描述】：

我在 Ubuntu 20.04 上使用 Python 3.8.10、OpenCV 版本 4.3.0 和 Cuda 10.2。我使用 Yolov3 为我想在图像中检测的 23 个对象生成了一个权重文件。一切正常，我可以在 Python 中围绕检测置信度高于某个阈值的对象绘制漂亮的框。

但是，循环通过由提供的所有输出需要半秒以上

outputs = net.forward(outputLayers)

当过滤高于某个置信水平的结果时。

这是我的循环：

boxes = []
confs = []
class_ids = []

for output in outputs: 
     for detect in output:
            scores = detect[5:]
            class_id = np.argmax(scores)
            conf = scores[class_id]
            if conf > 0.7:
                center_x = int(detect[0] * width)
                center_y = int(detect[1] * height)
                w = int(detect[2] * width)
                h = int(detect[3] * height)
                x = int(center_x - w/2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confs.append(float(conf))
                class_ids.append(class_id)

需要这么长时间的原因是输出的大小。似乎在调用net.forward(outputLayers) 时返回了所有可能的检测结果，无论置信度如何。就我而言，这些是我必须循环遍历的 30000 多个元素。

有什么方法可以在模型仍驻留在 GPU 上的情况下排除低于某个置信水平的检测？据我所知，net.forward() 似乎不允许任何过滤。任何想法将不胜感激！

【问题讨论】：

是outputs一个numpy数组，还是一个python列表，还是别的什么？
这是一个 numpy 数组。
它的确切形状是什么？您可以使用一些过滤所有内容的表达式来删除这两个循环。那么你也可以摆脱append 的东西，并把这些计算作为一个整体进行。 scores = outputs[:,:,5:]; mask = (scores.max(axis=2) > 0.7)（可能在两者之间有一个 argmax 来计算一次，然后是一些索引）
谢谢 Christoph，您的方法帮助我解决了我的问题。

标签： python opencv image-processing yolo opencv-python

【解决方案1】：

我找不到减少net.forward() 输出数量的方法，但是the comment by Christoph Rackwitz 为我提供了一种非常令人满意的加速我的代码的方法。我没有循环输出 numpy 数组，而是应用了：

mask = (outputs[:,5:].max(axis=1) > 0.7)
outputs = outputs[mask]

在 3.8-06 秒内将我的输出大小从大约 30000 减少到 33。

【讨论】：

【解决方案2】：

为了提高您的性能，您可以尝试使用 net.forward(..) 仅检测您想要的 23 个对象，而不检测 YoloV3 with coco.names 检测器提供的所有 80 个对象。

如果您只想使用 YoloV3 列表检测 23 个特定对象，则有一个 specific section of the darkflow repo 说明如何更改输出。

注意：您应该重新训练您的模型。他们通过 3 个类的示例来展示这一点。

我相信answer here会更有帮助，但不是1个特定的类，只需根据步骤将其调整为23个对象即可。

【讨论】：

感谢您的回答！我的模型已经经过定制训练，只有 23 个类，所以我认为问题出在 net.forward() 的输出上。它似乎输出了所有可能的框，而不考虑它们的置信度。
@Philipp 您是否尝试在此过程之前减小图像的大小？喜欢这里的第二个答案：stackoverflow.com/questions/54488986/…
是的，我看到了答案并尝试了。不过，我的问题有点不同。 net.forward() 的性能相当不错，计算只需大约 0.07 秒。问题是大约的冗长输出。之后我必须循环 30000 个盒子，以便只获得具有置信度 > confidence_level 的盒子（在我的情况下为 0.7）。
@Philipp 在这种情况下，如果速度性能是你真正关心的东西，我建议你用 C++ 编写一个等效的程序。嵌套的 for 循环应该快得多。我现在没有其他解决方案..