【发布时间】:2014-10-09 04:49:02
【问题描述】:
所以我在让我的代码在某些 openCL 设备上运行时遇到了一些问题。我正在 OSX 10.9.5 (Mavericks) 上的 2013 年中期 15" 视网膜屏幕 Macbook pro 上开发并使用 Xcode 6.0.1
使用 clGetDeviceIDs 访问所有可用设备并使用 clGetDeviceInfo 查看每个设备的信息后,我得到以下信息:
Device: Intel(R) Core(TM) i7-3635QM CPU @ 2.40GHz
Hardware version: OpenCL 1.2
Software version: 1.1
OpenCL C version: OpenCL C 1.2
Parallel compute units: 8
Device: HD Graphics 4000
Hardware version: OpenCL 1.2
Software version: 1.2(Aug 17 2014 20:29:07)
OpenCL C version: OpenCL C 1.2
Parallel compute units: 16
Device: GeForce GT 650M
Hardware version: OpenCL 1.2
Software version: 8.26.28 310.40.55b01
OpenCL C version: OpenCL C 1.2
Parallel compute units: 2
所以根据这个,我应该有 1 个 CPU 和 2 个 GPU 可用:一个 HD Graphics 4000 和一个 GeForce GT 650M。
我的问题是,当我尝试调用 clGetkernelWorkGroupInfo 时,如果我传入两个 GPU 之一的设备 ID,它会返回一个 CL_INVALID_DEVICE 错误,但如果我传入 CPU id 并且会毫无问题地计算我的内核代码,它工作得非常好。
这很奇怪,因为在那之前我的所有其他调用都适用于所有 3 台设备。我可以创建一个包含所有 3 个设备的上下文,创建 3 个单独的命令队列(每个设备一个),我可以编译一个程序并创建内核就好了。但是,一旦我接到那个电话,它就会说我的设备无效。
如果我注释掉对 clGetKernelWorkGroupInfo 的调用并指定我自己的全局/本地工作大小,当我尝试使用 CL_INVALID_PROGRAM_EXECUTABLE 错误调用 clEnqueueNDRangeKernel 时会收到错误。
我电脑上安装的显卡有问题吗?还是有一些我不知道的代码方面的事情?我只是不知道设备如何在那个呼叫之前有效,然后突然无效。
编辑这是我的代码(CheckError 只是我制作的一个函数,如果出现错误,它会打印出自定义错误消息)
cl_int err; //Error catcher
cl_platform_id platform; //Computer platform
cl_context context; //Single context for whole platform
cl_uint deviceCount; //Number of devices (CPU + GPU) available on machine
cl_device_id *devices; //Array of pointers to devices;
cl_program program; //OpenCL program
cl_command_queue *commandQueues; //One command queue for each device
/*---Definitions---*/
int DATA_SIZE = 16384;
double results[DATA_SIZE]; // results returned from device;
int currDevice = 0; //Use this to just access first available device
/*---Get First Platform---*/
err = clGetPlatformIDs(1, &platform, NULL);
CheckError(err, "A valid platform could not be found on this machine");
/*---Get Device Count---*/
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 0, NULL, &deviceCount);
CheckError(err, "Could not determine the number of devices available on this platform");
/*---Get All Devices---*/
devices = new cl_device_id[deviceCount];
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, deviceCount, devices, NULL);
CheckError(err, "Could not access the devices");
/*---Create a single context for all devices---*/
context = clCreateContext(NULL, deviceCount, devices, NULL, NULL, &err);
CheckError(err, "Could not create a context on this platform");
/*---For each device create a separate command queue---*/
commandQueues = new cl_command_queue[deviceCount];
for(int i = 0; i < deviceCount; i++)
{
commandQueues[i] = clCreateCommandQueue(context, devices[i], 0, &err);
string errMsg = "Was unable to successfully set up a command queue for device number " + to_string(i);
CheckError(err, errMsg);
}
/*---Read in cl file---*/
char *KernelSource = ReadFile("./Source/Sampling/Sampler.cl");
// Create the compute program from the source buffer
program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
CheckError(err, "Failed to create compute program!");
// Build the program executable
err = clBuildProgram(program, deviceCount, devices, NULL, NULL, NULL);
if (err != CL_SUCCESS)
{
size_t len;
char buffer[2048];
printf("Error: Failed to build program executable!\n");
clGetProgramBuildInfo(program, devices[currDevice], CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
printf("%s\n", buffer);
exit(1);
}
// Create the compute kernel in the program we wish to run
cl_kernel kernel = clCreateKernel(program, "mySampler", &err);
CheckError(err, "Failed to create compute kernel!");
// Create the input array in device memory for our calculation
cl_mem input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(double) * DATA_SIZE, NULL, &err);
CheckError(err, "Failed to allocate device memory");
// Set the arguments to our compute kernel
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
CheckError(err, "Failed to set kernel arguments");
size_t global, local;
// Get the maximum work group size for executing the kernel on the device
err = clGetKernelWorkGroupInfo(kernel, devices[currDevice], CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
CheckError(err, "Failed to retrieve work group info!");
// Execute the kernel over the entire range of our 1d input data set
// using the maximum number of work group items for this device
global = DATA_SIZE;
err = clEnqueueNDRangeKernel(commandQueues[currDevice], kernel, 1, NULL, &global, &local, 0, NULL, NULL);
CheckError(err, "Failed to execute kernel!");
// Wait for the command commands to get serviced before reading back results
clFinish(commandQueues[currDevice]);
// Read back the results from the device to verify the output
err = clEnqueueReadBuffer(commandQueues[currDevice], input, CL_TRUE, 0, sizeof(double) * DATA_SIZE, results, 0, NULL, NULL );
CheckError(err, "Failed to read array");
std::cout<<"DONE!"<<std::endl;
for(int i = 0; i < DATA_SIZE; i++)
{
std::cout<<"RESULT: "<<i<<" "<<results[i]<<std::endl;
}
// Shutdown and cleanup
clReleaseMemObject(input);
clReleaseProgram(program);
clReleaseKernel(kernel);
clReleaseCommandQueue(commandQueues[currDevice]);
clReleaseContext(context);
}
【问题讨论】:
-
听起来您已经为 CPU 构建了内核,然后尝试在 GPU 上使用它。您能否向我们展示您选择平台、构建程序然后执行这些查询的主机代码?
-
好的,我会在几秒钟内更新我的帖子
标签: xcode macos opencl gpu cpu