【发布时间】:2019-09-02 22:15:32
【问题描述】:
下面的代码没有向量化。用'istart = n * 1;'而不是 'istart = n * niters;'确实如此。用'istart = n * 2;'它又没有。
// Kernel for ERIAS_critical_code.py
__kernel void pi(
int niters,
__global float* A_d,
__global float* S_d,
__global float* B_d)
{
int num_wrk_items = get_local_size(0);
int local_id = get_local_id(0); // work item id
int group_id = get_group_id(0); // work group id
float accum = 0.0f;
int i, istart, iend, n;
n= group_id * num_wrk_items + local_id;
istart = n * niters;
iend = istart + niters;
for (i= istart; i< iend; i++){
accum += A_d[i] * S_d[i];
}
B_d[n] = accum;
barrier(CLK_LOCAL_MEM_FENCE); // test: result is correct without this statement
}
如果代码不能被矢量化,我会得到:
内核未矢量化
如果可以的话:
内核向量化成功(8)
知道为什么它没有被矢量化吗?
【问题讨论】: