加速理解 Python Keras predict 方法结果分析答案

【问题标题】：Speeding up and understanding Python Keras predict method results analysis加速理解 Python Keras predict 方法结果分析
【发布时间】：2020-04-30 06:00:15
【问题描述】：

我正在使用 Keras 和 Tensorflow 执行使用 Yolov3 标准以及 Yolov3-Tiny 的对象检测（大约快 10 倍）。一切正常，但性能相当差，我在 GPU 上每 2 秒获得一帧，在 CPU 上每 4 秒左右获得一帧。在分析代码时，发现decode_netout 方法花费了很多时间。我一般以this tutorial 为例。

有人可以帮我看看它在做什么吗？
Tensorflow（或其他库）中是否有其他方法可以进行这些计算？例如，我将一些自定义 Python 换成了 tf.image.non_max_suppression，它在性能方面有很大帮助。

# https://keras.io/models/model/
yhat = model.predict(image, verbose=0, use_multiprocessing=True)
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += detect.decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh

    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes

【问题讨论】：

标签： python-3.x keras object-detection tensorflow2.0 yolo

【解决方案1】：

我有一个类似的 GPU 设置，并且遇到了同样的问题。我一直在从事 YoloV3 Keras 项目，并且在过去 2 周里一直在寻找确切的问题。在最后对我所有的函数进行时间装箱后，我发现将问题缩小到“def do_nms”，然后引导我找到你在“def decode_netout”上方发布的函数。问题是非最大抑制很慢。

我找到的解决方案是调整这条线

if(objectness.all() <= obj_thresh): continue

到

if (objectness <= obj_thresh).all(): continue

性能差异是白天和黑夜。我正在推动接近 30 FPS，一切都运行得更好。

归功于这个 Git 问题/解决方案：

https://github.com/experiencor/keras-yolo3/issues/177

我花了一段时间才弄清楚这一点，所以我希望这对其他人有所帮助。

【讨论】：

谢谢，这确实加快了速度。您可能还对this implementation 感兴趣。我把它换掉了，性能还不错。
我只是比较了性能，使用上面链接的跨 CPU 和 GPU 的实现，我的性能提高了大约 30%。