【发布时间】:2018-02-12 17:05:04
【问题描述】:
我对 opencl 很陌生,正在尝试我的第一个程序。我实现了一个简单的波形正弦滤波。该代码有效,但是我有两个问题:
- 一旦我增加输入矩阵的大小(行数需要增加到 100 000),即使矩阵相对较小(几 mb),我也会得到(clEnqueueReadBuffer 失败:OUT_OF_RESOURCES)。这在某种程度上与我认为的工作组规模有关,但有人可以详细说明我如何解决这个问题吗? 会不会是驱动问题?
更新:
- 离开组大小
None崩溃 - 调整 GPU
(1,600)和 IntelHD(1,50)的组大小让我可以达到大约 6400 行。然而,对于更大的尺寸,它在 GPU 上崩溃,而 IntelHD 只是冻结并且什么都不做(资源监视器上为 0%)
2.我有 Intel HD4600 和 Nvidia K1100M GPU 可用,但是 Intel 的速度要快约 2 倍。我理解部分原因是我不需要将数组复制到与外部 GPU 不同的内部 Intel 内存。但是我预计会有边际差异。这是正常的还是应该更好地优化我的代码以在 GPU 上使用? (已解决)
感谢您的帮助!!
from __future__ import absolute_import, print_function
import numpy as np
import pyopencl as cl
import os
os.environ['PYOPENCL_COMPILER_OUTPUT'] = '1'
import matplotlib.pyplot as plt
def resample_opencl(y,key='GPU'):
#
# selecting to run on GPU or CPU
#
newlen = 1200
my_platform = cl.get_platforms()[0]
device =my_platform.get_devices()[0]
for found_platform in cl.get_platforms():
if (key == 'GPU') and (found_platform.name == 'NVIDIA CUDA'):
my_platform = found_platform
device =my_platform.get_devices()[0]
print("using GPU")
#
#Create context for GPU/CPU
#
ctx = cl.Context([device])
#
# Create queue for each kernel execution
#
queue = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)
# queue = cl.CommandQueue(ctx)
prg = cl.Program(ctx, """
__kernel void resample(
int M,
__global const float *y_g,
__global float *res_g)
{
int row = get_global_id(0);
int col = get_global_id(1);
int gs = get_global_size(1);
__private float tmp,tmp2,x;
__private float t;
t = (float)(col)/2+1;
tmp=0;
tmp2=0;
for (int i=0; i<M ; i++)
{
x = (float)(i+1);
tmp2 = (t- x)*3.14159;
if (t == x) {
tmp += y_g[row*M + i] ;
}
else
tmp += y_g[row*M +i] * sin(tmp2)/tmp2;
}
res_g[row*gs + col] = tmp;
}
""").build()
mf = cl.mem_flags
y_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=y)
res = np.zeros((np.shape(y)[0],newlen)).astype(np.float32)
res_g = cl.Buffer(ctx, mf.WRITE_ONLY, res.nbytes)
M = np.array(600).astype(np.int32)
prg.resample(queue, res.shape, (1,200),M, y_g, res_g)
event = cl.enqueue_copy(queue, res, res_g)
print("success")
event.wait()
return res,event
if __name__ == "__main__":
#
# this is the number i need to increase ( up to some 100 000)
numrows = 2000
Gaussian = lambda t : 10 * np.exp(-(t - 50)**2 / (2. * 2**2))
x = np.linspace(1, 101, 600, endpoint=False).astype(np.float32)
t = np.linspace(1, 101, 1200, endpoint=False).astype(np.float32)
y= np.zeros(( numrows,np.size(x)))
y[:] = Gaussian(x).astype(np.float32)
y = y.astype(np.float32)
res,event = resample_opencl(y,'GPU')
print ("OpenCl GPU profiler",(event.profile.end-event.profile.start)*1e-9)
#
# test plot if it worked
#
plt.figure()
plt.plot(x,y[1,:],'+')
plt.plot(t,res[1,:])
【问题讨论】:
标签: python-3.x opencl pyopencl