如何在python中加速咖啡馆分类器答案

【问题标题】：How to speed up caffe classifer in python如何在python中加速咖啡馆分类器
【发布时间】：2015-08-29 13:57:34
【问题描述】：

我正在使用 python 来使用 caffe 分类器。我从相机中获取图像并从训练集中执行预测图像。它运作良好，但问题是速度非常慢。我认为只有 4 帧/秒。您能否向我建议一些方法来改善我的代码中的计算时间？该问题可以解释如下。我必须通过以下代码重新加载一个大小约为 80MB 的网络模型 age_net.caffemodel

age_net_pretrained='./age_net.caffemodel'
age_net_model_file='./deploy_age.prototxt'
age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
           mean=mean,
           channel_swap=(2,1,0),
           raw_scale=255,
           image_dims=(256, 256))

对于每个输入图像 (caffe_input)，我调用 predict 函数

prediction = age_net.predict([caffe_input])

我认为由于网络的规模非常大。然后预测功能需要很长时间来预测图像。我想慢的时间是由它来的。
这是我的完整参考代码。它被我改变了。

from conv_net import *

import matplotlib.pyplot as plt
import numpy as np
import cv2
import glob
import os
caffe_root = './caffe' 
import sys
sys.path.insert(0, caffe_root + 'python')
import caffe
DATA_PATH = './face/'
cnn_params = './params/gender_5x5_5_5x5_10.param'
face_params = './params/haarcascade_frontalface_alt.xml'
def format_frame(frame):
    img = frame.astype(np.float32)/255.
    img = img[...,::-1]
    return img   

if __name__ == '__main__':    
    files = glob.glob(os.path.join(DATA_PATH, '*.*'))

    # This is the configuration of the full convolutional part of the CNN
    # `d` is a list of dicts, where each dict represents a convolution-maxpooling
    # layer. 
    # Eg c1 - first layer, convolution window size
    # p1 - first layer pooling window size
    # f_in1 - first layer no. of input feature arrays
    # f_out1 - first layer no. of output feature arrays
    d = [{'c1':(5,5),
          'p1':(2,2),
          'f_in1':1, 'f_out1':5},
         {'c2':(5,5),
          'p2':(2,2),
          'f_in2':5, 'f_out2':10}]

    # This is the configuration of the mlp part of the CNN
    # first tuple has the fan_in and fan_out of the input layer
    # of the mlp and so on.
    nnet =  [(800,256),(256,2)]    
    c = ConvNet(d,nnet, (45,45))
    c.load_params(cnn_params)        
    face_cascade = cv2.CascadeClassifier(face_params)
    cap = cv2.VideoCapture(0)
    cv2.namedWindow("Image", cv2.WINDOW_NORMAL)

    plt.rcParams['figure.figsize'] = (10, 10)
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    mean_filename='./mean.binaryproto'
    proto_data = open(mean_filename, "rb").read()
    a = caffe.io.caffe_pb2.BlobProto.FromString(proto_data)
    mean  = caffe.io.blobproto_to_array(a)[0]
    age_net_pretrained='./age_net.caffemodel'
    age_net_model_file='./deploy_age.prototxt'
    age_net = caffe.Classifier(age_net_model_file, age_net_pretrained,
               mean=mean,
               channel_swap=(2,1,0),
               raw_scale=255,
               image_dims=(256, 256))
    age_list=['(0, 2)','(4, 6)','(8, 12)','(15, 20)','(25, 32)','(38, 43)','(48, 53)','(60, 100)']
    while(True):

        val, image = cap.read()        
        if image is None:
            break
        image = cv2.resize(image, (320,240))
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5, minSize=(30,30))

        for f in faces:
            x,y,w,h = f
            cv2.rectangle(image, (x,y), (x+w,y+h), (0,255,255))            
            face_image_rgb = image[y:y+h, x:x+w]
            caffe_input = cv2.resize(face_image_rgb, (256, 256)).astype(np.float32)
            prediction = age_net.predict([caffe_input]) 
            print 'predicted age:', age_list[prediction[0].argmax()]       
        cv2.imshow('Image', image)
        ch = 0xFF & cv2.waitKey(1)
        if ch == 27:
            break
        #break

【问题讨论】：

为什么不尝试分析您的代码，以便查明瓶颈所在？
好的。我更新了它。但是，我编写了很长的代码。
@user8430 始终建议找出长代码的哪一部分占用最多的资源/时间，除非您犯了明显的错误。
@a-Jays：根据 boardrider 的要求，我提出了我的完整代码。我会在简短的更新中指出我的问题。待会儿看

标签： python computer-vision neural-network deep-learning caffe

【解决方案1】：

您也可以尝试channel pruning 您的网络。这是一种有效修剪每一层中的通道的算法，可以将网络速度提高 2-5 倍。 github地址为：https://github.com/yihui-he/channel-pruning

【讨论】：

【解决方案2】：

尝试用oversmaple=False 调用age_net.predict([caffe_input])：

prediction = age_net.predict([caffe_input], oversample=False)

predict 的默认行为是创建 10 个略有不同的输入图像裁剪并将它们提供给网络进行分类，通过禁用此选项，您应该获得 x10 的加速。

【讨论】：

@Shai 我正在做类似于 OP 的事情，但有一个问题。我正在根据来自网络摄像头的图像运行预测，但在单独的 python 进程中进行预测。出现的问题是程序的 FPS 减慢，即使预测是在 GPU 上以单独的进程进行的。我已经尝试过您的 oversample 解决方案，虽然我的预测发生得更快，但 FPS 的增加是最小的。我在这里为我的问题设置了多个赏金：stackoverflow.com/questions/39522693/… 虽然它有（续）
受到了很多关注，但没有具体的答案。曾尝试联系 caffe 开发人员来解释这种奇怪之处，但也没有运气。很想听听您对我的类似情况的意见。谢谢！
@user3543300 在 CPU GPU 多进程性能方面，恐怕我无能为力。对不起。
@Shai 我认为这更像是一个 caffe 和 opencv 问题，而不是一个多处理问题。关于在 GPU 模式下运行 caffe 时如何减少 CPU 负载的任何想法？