Caffe 的transformer.preprocessing 完成时间太长答案

【问题标题】：Caffe's transformer.preprocessing takes too long to completeCaffe 的transformer.preprocessing 完成时间太长
【发布时间】：2017-07-25 05:39:00
【问题描述】：

我编写了一个简单的脚本来使用PyCaffe 测试模型，但我注意到它非常慢！即使在 GPU 上！我的测试集有 82K 个大小为 256x256 的样本，当我运行下面给出的代码时，需要几个小时才能完成。

我什至使用批量图像而不是单个图像，但没有任何变化。目前，它已经运行了 5 个小时，只处理了 50K 个样本！我应该怎么做才能让它更快？

我可以完全避免使用transformer.preprocessing吗？如果是这样怎么办？

这里是sn-p：

#run on gpu
caffe.set_mode_gpu()

#Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()

predicted_lables = []
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count = 0
correct = 0
batch=[]
plabe_ls = []
batch_size = 50

net1 = caffe.Net(args.proto, args.model, caffe.TEST) 
transformer = caffe.io.Transformer({'data': net1.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1))  
transformer.set_mean('data', mean_image[0].mean(1).mean(1))
transformer.set_raw_scale('data', 255)      
transformer.set_channel_swap('data', (2,1,0)) 
net1.blobs['data'].reshape(batch_size, 3,224, 224)
data_blob_shape = net1.blobs['data'].data.shape
data_blob_shape = list(data_blob_shape)
i=0

mu = np.array([ 104,  117,  123])#imagenet mean

#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
    lmdb_env = lmdb.open(args.db_path)
    lmdb_txn = lmdb_env.begin()
    lmdb_cursor = lmdb_txn.cursor()
    for key, value in lmdb_cursor:
        count += 1 
        datum = caffe.proto.caffe_pb2.Datum()
        datum.ParseFromString(value)
        label = int(datum.label)
        image = caffe.io.datum_to_array(datum).astype(np.uint8)
        if(count % 5000 == 0):
            print('count: ',count)
        if(i < batch_size):
            i+=1
            inf= key,image,label
            batch.append(inf)
        if(i >= batch_size):
            #process n image 
            ims=[]
            for x in range(len(batch)):
                ims.append(transformer.preprocess('data',batch[x][1]))# - mean_image[0].mean(1).mean(1) )
            net1.blobs['data'].data[...] = ims[:]
            out_1 = net1.forward()
            plbl = np.asarray( out_1['pred'])   
            plbl = plbl.argmax(axis=1)
            for j in range(len(batch)):
                if (plbl[j] == batch[j][2]):
                    correct+=1
                else:
                    misclassified.append(batch[j][0])

                predicted_lables.append(plbl[j])
                true_labels.append(batch[j][2]) 
            batch.clear()
            i=0

更新：

通过替换

for x in range(len(batch)):
    ims.append(transformer.preprocess('data',batch[x][1]))
    net1.blobs['data'].data[...] = ims[:]

与

for x in range(len(batch)):
   img = batch[x][1]
   ims.append(img[:,0:224,0:224])

在不到一分钟内处理了 82K 个样本。罪魁祸首确实是预处理方法，我不知道它为什么会这样！

无论如何，我不能以这种方式使用平均文件。我试着做

ims.append(img[:,0:224,0:224] - mean.mean(1).mean(1))

也一样，但遇到了这个错误：

ValueError: operands could not be broadcast together with shapes (3,224,224) (3,)

我还需要找到更好的方法来裁剪图像，我不知道是否需要将其重新调整为 224？或者我应该像咖啡一样使用作物？

【问题讨论】：

你确定这是Transformer的错吗？我的意思是，它不是 GPU 加速的（完全用 python 编写），但它不应该那么慢。考虑使用time 模块找出罪魁祸首。
是的，我检查了好几次。没有transformer.preprocess() 它运行得更快！

标签： caffe pycaffe

【解决方案1】：

我终于成功了！这是运行得更快的代码：

predicted_lables=[]
true_labels = []
misclassified =[]
class_names = ['unsafe','safe']
count =0
correct = 0
batch = []
plabe_ls = []
batch_size = 50
cropx = 224
cropy = 224
i = 0

# Extract mean from the mean image file
mean_blobproto_new = caffe.proto.caffe_pb2.BlobProto()
f = open(args.mean, 'rb')
mean_blobproto_new.ParseFromString(f.read())
mean_image = caffe.io.blobproto_to_array(mean_blobproto_new)
f.close()

caffe.set_mode_gpu() 
net1 = caffe.Net(args.proto, args.model, caffe.TEST) 
net1.blobs['data'].reshape(batch_size, 3, 224, 224)
data_blob_shape = net1.blobs['data'].data.shape

#check and see if its lmdb or leveldb
if(args.db_type.lower() == 'lmdb'):
    lmdb_env = lmdb.open(args.db_path)
    lmdb_txn = lmdb_env.begin()
    lmdb_cursor = lmdb_txn.cursor()
    for key, value in lmdb_cursor:
        count += 1 
        datum = caffe.proto.caffe_pb2.Datum()
        datum.ParseFromString(value)
        label = int(datum.label)
        image = caffe.io.datum_to_array(datum).astype(np.float32)
        #key,image,label
        #buffer n image
        if(count % 5000 == 0):          
            print('{0} samples processed so far'.format(count))
        if(i < batch_size):
            i += 1
            inf= key,image,label
            batch.append(inf)
            #print(key)                 
        if(i >= batch_size):
            #process n image 
            ims=[]              
            for x in range(len(batch)):
                img = batch[x][1]
                #img has c,h,w shape! its already gone through transpose
                #and channel swap when it was being saved into lmdb!
                #method I: crop the both the image and mean file 
                #ims.append(img[:,0:224,0:224] - mean_image[0][:,0:224,0:224] )
                #Method II : resize the image to the desired size(crop size) 
                #img = caffe.io.resize_image(img.transpose(2,1,0), (224, 224))
                #Method III : use center crop just like caffe does in test time
                #center crop
                c,w,h = img.shape
                startx = h//2 - cropx//2
                starty = w//2 - cropy//2
                img = img[:, startx:startx + cropx, starty:starty + cropy]                  
                #transpose the image so we can subtract from mean
                img = img.transpose(2,1,0)
                img -= mean_image[0].mean(1).mean(1)
                #transpose back to the original state
                img = img.transpose(2,1,0)
                ims.append(img)        

            net1.blobs['data'].data[...] = ims[:]
            out_1 = net1.forward()
            plabe_ls = out_1['pred']
            plbl = np.asarray(plabe_ls)
            plbl = plbl.argmax(axis=1)
            for j in range(len(batch)):
                if (plbl[j] == batch[j][2]):
                    correct += 1
                else:
                    misclassified.append(batch[j][0])

                predicted_lables.append(plbl[j])        
                true_labels.append(batch[j][2]) 
            batch.clear()
            i = 0

虽然我没有得到准确的准确度，但非常接近（在 98.65 中我得到了 98.61%！我不知道是什么导致了这种差异！）

更新：
transformer.preprocess 完成的时间太长的原因是因为它的 resize_image() 方法。 resize_image 需要图像采用H,W,C, 的形式，而在我的情况下，图像已经被转置和通道交换（以 c、w、h 的形式）（我正在读取 lmdb 数据集），并且这导致resize_image() 采用其最慢的方法来调整图像大小，因此处理每个图像需要 0.6 秒。现在知道了这一点，将图像转换为正确的尺寸，就可以解决这个问题。意思是我必须做的：

ims.append(transformer.preprocess('data',img.transpose(2,1,0)))

请注意，它仍然比上述方法慢，但比以前快得多！

【讨论】：

你能解释一下你做了什么，为什么它更好吗？从注释行和不相关的细节中清除代码，以及添加一些 basic formatting 也会有所帮助。
看起来像是你看到的东西之一，拍拍你的额头并认为“这太明显了”——但直到你深入挖掘，原因才会显现出来:)
@PrzemekD: 没错 :))