SciPy/Numpy 的 Pooling/Convolution 比 Tensorflow 的 Convolution/Pooling 快？答案

【问题标题】：SciPy/Numpy's Pooling/Convolution faster than Tensorflow's Convolution/Pooling?SciPy/Numpy 的 Pooling/Convolution 比 Tensorflow 的 Convolution/Pooling 快？
【发布时间】：2018-06-05 22:17:03
【问题描述】：

我正在尝试使用 GPU 来加速我的神经网络应用程序（Spiking 网络）中的卷积和池化操作。我写了一个小脚本来看看我可以通过使用 TensorFlow 获得多少加速。令人惊讶的是，SciPy/Numpy 做得更好。在我的应用程序中，所有输入（图像）都存储在磁盘上，但例如，我创建了一个随机初始化的大小为27x27 的图像和大小为5x5x30 的权重内核，我确保我没有从CPU 到 GPU，我还将输入图像大小增加到 270x270，将权重内核增加到 7x7x30，但我仍然没有看到任何改进。我通过设置

确保所有 TF 方法实际上都在我的 GPU 上执行

sess =tf.Session(config=tf.ConfigProto(log_device_placement=True))

我可以访问集群上的 2 个 GPU（Tesla K20m）。

这是我的代码：

import tensorflow as tf
import numpy as np
from scipy import signal
import time
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

image_size = 27
kernel_size = 5
nofMaps = 30

def convolution(Image, weights):
    in_channels = 1 # 1 because our image has 1 units in the -z direction. 
    out_channels = weights.shape[-1]
    strides_1d = [1, 1, 1, 1]

    #in_2d = tf.constant(Image, dtype=tf.float32)
    in_2d = Image
    #filter_3d = tf.constant(weights, dtype=tf.float32)
    filter_3d =weights

    in_width = int(in_2d.shape[0])
    in_height = int(in_2d.shape[1])

    filter_width = int(filter_3d.shape[0])
    filter_height = int(filter_3d.shape[1])

    input_4d   = tf.reshape(in_2d, [1, in_height, in_width, in_channels])
    kernel_4d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, out_channels])
    inter = tf.nn.conv2d(input_4d, kernel_4d, strides=strides_1d, padding='VALID')
    output_3d = tf.squeeze(inter)
    output_3d= sess.run(output_3d)
    return output_3d


def pooling(Image):
    in_channels = Image.shape[-1]
    Image_3d = tf.constant(Image, dtype = tf.float32)
    in_width = int(Image.shape[0])
    in_height = int(Image.shape[1])
    Image_4d = tf.reshape(Image_3d,[1,in_width,in_height,in_channels])
    pooled_pots4d = tf.layers.max_pooling2d(inputs=Image_4d, pool_size=[2, 2], strides=2)
    pooled_pots3d = tf.squeeze(pooled_pots4d)
    return sess.run(pooled_pots3d)


t1 = time.time()
#with tf.device('/device:GPU:1'):
Image = tf.random_uniform([image_size, image_size], name='Image')
weights = tf.random_uniform([kernel_size,kernel_size,nofMaps], name='Weights')
conv_result = convolution(Image,weights)
pool_result = pooling(conv_result)

print('Time taken:{}'.format(time.time()-t1))
#with tf.device('/device:CPU:0'):
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])


def scipy_convolution(Image,weights):
    instant_conv1_pots = np.zeros((image_size-kernel_size+1,image_size-kernel_size+1,nofMaps))
    for i in range(weights.shape[-1]):
        instant_conv1_pots[:,:,i]=signal.correlate(Image,weights[:,:,i],mode='valid',method='fft')
    return instant_conv1_pots

def scipy_pooling(conv1_spikes):
    '''
       Reshape splitting each of the two axes into two each such that the
       latter of the split axes is of the same length as the block size.
       This would give us a 4D array. Then, perform maximum finding along those
       latter axes, which would be the second and fourth axes in that 4D array.
       https://stackoverflow.com/questions/41813722/numpy-array-reshaped-but-how-to-change-axis-for-pooling
    '''
    if(conv1_spikes.shape[0]%2!=0): #if array is odd size then omit the last row and col
        conv1_spikes = conv1_spikes[0:-1,0:-1,:]
    else:
        conv1_spikes = conv1_spikes
    m,n = conv1_spikes[:,:,0].shape
    o   = conv1_spikes.shape[-1]
    pool1_spikes = np.zeros((m/2,n/2,o))
    for i in range(o):
        pool1_spikes[:,:,i]=conv1_spikes[:,:,i].reshape(m/2,2,n/2,2).max(axis=(1,3))
    return pool1_spikes
t1 = time.time()
Image = np.random.rand(image_size,image_size)
weights = np.random.rand(kernel_size,kernel_size,nofMaps)
conv_result = scipy_convolution(Image,weights)
pool_result = scipy_pooling(conv_result)
print('Time taken:{}'.format(time.time()-t1))
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])
~

结果如下：

Time taken:0.746644973755
Pool_result shape:(11, 11, 30)
Time taken:0.0127348899841
Pool_result shape:(11, 11, 30)

【问题讨论】：

您的时间不仅包括卷积操作，还包括构建图形、设置变量等。因为这需要相当长的时间，所以这不是一个公平的比较。
我将输入图像的大小增加到 270x270，内核增加到 7x7x30 仍然 TF 需要更多时间，TF 是否需要更多时间来为更大的图像设置图形？
我不明白为什么它被否决了，我解释了我做了什么，我还探索了其他可能会减慢代码速度的选项，因为我没有发现任何问题这个网站寻求建议，如果人们不想建议，那么我不确定为什么这个网站存在！
我没有投反对票，但我也没什么好说的了。同样，您的比较是不公平的，因为您包括了为 TF 版本构建图表的时间（这很重要）。除非您提供大规模示例，证明 TF 在现实设置中比 scipy 慢（例如批量输入、按顺序处理的多个批次、多个卷积层），否则我看不出这个问题的重点。

标签： python-2.7 tensorflow scipy

【解决方案1】：

根据评论者的建议，我设置了 image_size=270 并将 convolution and pool 函数都包含在一个 for 循环中，现在，TF 的性能优于 SciPy 请注意，我使用的是 tf.nn.conv2d 而不是 @987654326 @。我还在tf.nn.conv2d 中设置了参数use_cudnn_on_gpu=True，但这并没有伤害或帮助。

代码如下：

import tensorflow as tf
import numpy as np
from scipy import signal
import time
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

image_size = 270
kernel_size = 5
nofMaps = 30

def convolution(Image, weights):
    in_channels = 1 # 1 because our image has 1 units in the -z direction. 
    out_channels = weights.shape[-1]
    strides_1d = [1, 1, 1, 1]

    #in_2d = tf.constant(Image, dtype=tf.float32)
    in_2d = Image
    #filter_3d = tf.constant(weights, dtype=tf.float32)
    filter_3d =weights

    in_width = int(in_2d.shape[0])
    in_height = int(in_2d.shape[1])

    filter_width = int(filter_3d.shape[0])
    filter_height = int(filter_3d.shape[1])

    input_4d   = tf.reshape(in_2d, [1, in_height, in_width, in_channels])
    kernel_4d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, out_channels])
    inter = tf.nn.conv2d(input_4d, kernel_4d, strides=strides_1d, padding='VALID',use_cudnn_on_gpu=True)
    output_3d = tf.squeeze(inter)
    #t1 = time.time()
    output_3d= sess.run(output_3d)
    #print('TF Time for Conv:{}'.format(time.time()-t1))
    return output_3d


def pooling(Image):
    in_channels = Image.shape[-1]
    Image_3d = tf.constant(Image, dtype = tf.float32)
    in_width = int(Image.shape[0])
    in_height = int(Image.shape[1])
    Image_4d = tf.reshape(Image_3d,[1,in_width,in_height,in_channels])
    pooled_pots4d = tf.layers.max_pooling2d(inputs=Image_4d, pool_size=[2, 2], strides=2)
    pooled_pots3d = tf.squeeze(pooled_pots4d)
    #t1 = time.time()
    pool_res = sess.run(pooled_pots3d)
    #print('TF Time for Pool:{}'.format(time.time()-t1))
    return pool_res


#with tf.device('/device:GPU:1'):
Image = tf.random_uniform([image_size, image_size], name='Image')
weights = tf.random_uniform([kernel_size,kernel_size,nofMaps], name='Weights')
#init = tf.global_variables_initializer
#sess.run(init)
t1 = time.time()
for i in range(150):
    #t1 = time.time()
    conv_result = convolution(Image,weights)
    pool_result = pooling(conv_result)
    #print('TF Time taken:{}'.format(time.time()-t1))
print('TF Time taken:{}'.format(time.time()-t1))
#with tf.device('/device:CPU:0'):
print('TF Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])


def scipy_convolution(Image,weights):
    instant_conv1_pots = np.zeros((image_size-kernel_size+1,image_size-kernel_size+1,nofMaps))
    for i in range(weights.shape[-1]):
        instant_conv1_pots[:,:,i]=signal.correlate(Image,weights[:,:,i],mode='valid',method='fft')
    return instant_conv1_pots

def scipy_pooling(conv1_spikes):
    '''
       Reshape splitting each of the two axes into two each such that the
       latter of the split axes is of the same length as the block size.
       This would give us a 4D array. Then, perform maximum finding along those
       latter axes, which would be the second and fourth axes in that 4D array.
       https://stackoverflow.com/questions/41813722/numpy-array-reshaped-but-how-to-change-axis-for-pooling
    '''
    if(conv1_spikes.shape[0]%2!=0): #if array is odd size then omit the last row and col
        conv1_spikes = conv1_spikes[0:-1,0:-1,:]
    else:
        conv1_spikes = conv1_spikes
    m,n = conv1_spikes[:,:,0].shape
    o   = conv1_spikes.shape[-1]
    pool1_spikes = np.zeros((m/2,n/2,o))
    for i in range(o):
        pool1_spikes[:,:,i]=conv1_spikes[:,:,i].reshape(m/2,2,n/2,2).max(axis=(1,3))
    return pool1_spikes
Image = np.random.rand(image_size,image_size)
weights = np.random.rand(kernel_size,kernel_size,nofMaps)
t1 = time.time()
for i in range(150):
    conv_result = scipy_convolution(Image,weights)
    pool_result = scipy_pooling(conv_result)
print('Scipy Time taken:{}'.format(time.time()-t1))
print('Scipy Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])

结果如下：

image_size = 27x27
kernel_size = 5x5x30
iterations = 150
TF Time taken:11.0800771713
TF Pool_result shape:(11, 11, 30)
Scipy Time taken:1.4141368866
Scipy Pool_result shape:(11, 11, 30)

image_size = 270x270
kernel_size = 5x5x30
iterations = 150

TF Time taken:26.2359631062
TF Pool_result shape:(133, 133, 30)
Scipy Time taken:31.6651778221
Scipy Pool_result shape:(11, 11, 30)


image_size = 500x500
kernel_size = 5x5x30
iterations = 150

TF Time taken:89.7967050076
TF Pool_result shape:(248, 248, 30)
Scipy Time taken:143.391746044
Scipy Pool_result shape:(248, 248, 30)

在第二种情况下，您可以看到我的时间减少了大约 18%。在第三种情况下，您可以看到我的时间减少了大约 38%。

【讨论】：