【发布时间】:2018-06-05 22:17:03
【问题描述】:
我正在尝试使用 GPU 来加速我的神经网络应用程序(Spiking 网络)中的卷积和池化操作。我写了一个小脚本来看看我可以通过使用 TensorFlow 获得多少加速。令人惊讶的是,SciPy/Numpy 做得更好。在我的应用程序中,所有输入(图像)都存储在磁盘上,但例如,我创建了一个随机初始化的大小为27x27 的图像和大小为5x5x30 的权重内核,我确保我没有从CPU 到 GPU,我还将输入图像大小增加到 270x270,将权重内核增加到 7x7x30,但我仍然没有看到任何改进。我通过设置
sess =tf.Session(config=tf.ConfigProto(log_device_placement=True))
我可以访问集群上的 2 个 GPU(Tesla K20m)。
这是我的代码:
import tensorflow as tf
import numpy as np
from scipy import signal
import time
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
image_size = 27
kernel_size = 5
nofMaps = 30
def convolution(Image, weights):
in_channels = 1 # 1 because our image has 1 units in the -z direction.
out_channels = weights.shape[-1]
strides_1d = [1, 1, 1, 1]
#in_2d = tf.constant(Image, dtype=tf.float32)
in_2d = Image
#filter_3d = tf.constant(weights, dtype=tf.float32)
filter_3d =weights
in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
input_4d = tf.reshape(in_2d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, out_channels])
inter = tf.nn.conv2d(input_4d, kernel_4d, strides=strides_1d, padding='VALID')
output_3d = tf.squeeze(inter)
output_3d= sess.run(output_3d)
return output_3d
def pooling(Image):
in_channels = Image.shape[-1]
Image_3d = tf.constant(Image, dtype = tf.float32)
in_width = int(Image.shape[0])
in_height = int(Image.shape[1])
Image_4d = tf.reshape(Image_3d,[1,in_width,in_height,in_channels])
pooled_pots4d = tf.layers.max_pooling2d(inputs=Image_4d, pool_size=[2, 2], strides=2)
pooled_pots3d = tf.squeeze(pooled_pots4d)
return sess.run(pooled_pots3d)
t1 = time.time()
#with tf.device('/device:GPU:1'):
Image = tf.random_uniform([image_size, image_size], name='Image')
weights = tf.random_uniform([kernel_size,kernel_size,nofMaps], name='Weights')
conv_result = convolution(Image,weights)
pool_result = pooling(conv_result)
print('Time taken:{}'.format(time.time()-t1))
#with tf.device('/device:CPU:0'):
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])
def scipy_convolution(Image,weights):
instant_conv1_pots = np.zeros((image_size-kernel_size+1,image_size-kernel_size+1,nofMaps))
for i in range(weights.shape[-1]):
instant_conv1_pots[:,:,i]=signal.correlate(Image,weights[:,:,i],mode='valid',method='fft')
return instant_conv1_pots
def scipy_pooling(conv1_spikes):
'''
Reshape splitting each of the two axes into two each such that the
latter of the split axes is of the same length as the block size.
This would give us a 4D array. Then, perform maximum finding along those
latter axes, which would be the second and fourth axes in that 4D array.
https://stackoverflow.com/questions/41813722/numpy-array-reshaped-but-how-to-change-axis-for-pooling
'''
if(conv1_spikes.shape[0]%2!=0): #if array is odd size then omit the last row and col
conv1_spikes = conv1_spikes[0:-1,0:-1,:]
else:
conv1_spikes = conv1_spikes
m,n = conv1_spikes[:,:,0].shape
o = conv1_spikes.shape[-1]
pool1_spikes = np.zeros((m/2,n/2,o))
for i in range(o):
pool1_spikes[:,:,i]=conv1_spikes[:,:,i].reshape(m/2,2,n/2,2).max(axis=(1,3))
return pool1_spikes
t1 = time.time()
Image = np.random.rand(image_size,image_size)
weights = np.random.rand(kernel_size,kernel_size,nofMaps)
conv_result = scipy_convolution(Image,weights)
pool_result = scipy_pooling(conv_result)
print('Time taken:{}'.format(time.time()-t1))
print('Pool_result shape:{}'.format(pool_result.shape))
#print('first map of pool result:\n',pool_result[:,:,0])
~
结果如下:
Time taken:0.746644973755
Pool_result shape:(11, 11, 30)
Time taken:0.0127348899841
Pool_result shape:(11, 11, 30)
【问题讨论】:
-
您的时间不仅包括卷积操作,还包括构建图形、设置变量等。因为这需要相当长的时间,所以这不是一个公平的比较。
-
我将输入图像的大小增加到 270x270,内核增加到 7x7x30 仍然 TF 需要更多时间,TF 是否需要更多时间来为更大的图像设置图形?
-
我不明白为什么它被否决了,我解释了我做了什么,我还探索了其他可能会减慢代码速度的选项,因为我没有发现任何问题这个网站寻求建议,如果人们不想建议,那么我不确定为什么这个网站存在!
-
我没有投反对票,但我也没什么好说的了。同样,您的比较是不公平的,因为您包括了为 TF 版本构建图表的时间(这很重要)。除非您提供大规模示例,证明 TF 在现实设置中比 scipy 慢(例如批量输入、按顺序处理的多个批次、多个卷积层),否则我看不出这个问题的重点。
标签: python-2.7 tensorflow scipy