【发布时间】:2020-11-14 08:18:45
【问题描述】:
我想加速这些嵌套循环。由于 v (NMAX=MAX(NX1, NX2, NX3)) 的维度,我知道这可能是两个外部循环并行化的冲突。我尝试使用 private 子句:
static double **v;
if (v == NULL) {
v = ARRAY_2D(NMAX_POINT, NVAR, double);
}
#pragma acc parallel loop present(V, U) private(v[:NMAX_POINT][:NVAR])
for (k = kbeg; k <= kend; k++){ g_k = k;
#pragma acc loop
for (j = jbeg; j <= jend; j++){ g_j = j;
#pragma acc loop collapse(2)
for (i = ibeg; i <= iend; i++) {
for (nv = 0; nv < NVAR; nv++){
v[i][nv] = V[nv][k][j][i];
}}
#pragma acc routine(PrimToCons) seq
PrimToCons (v, U[k][j], ibeg, iend);
}}
我收到以下错误:
Generating present(V[:][:][:][:],U[:][:][:][:])
Generating Tesla code
144, #pragma acc loop seq
146, #pragma acc loop seq
151, #pragma acc loop gang, vector(128) collapse(2) /* blockIdx.x threadIdx.x */
154, /* blockIdx.x threadIdx.x collapsed */
144, Accelerator restriction: induction variable live-out from loop: g_k
Complex loop carried dependence of v->-> prevents parallelization
146, Accelerator restriction: induction variable live-out from loop: g_j
Loop carried dependence due to exposed use of v prevents parallelization
Complex loop carried dependence of V->->->->,v->-> prevents parallelization
g_k 和 g_j 是 extern int。我以前从未见过消息“induction variable live-out from loop”。
编辑: 我按照建议修改了循环,但它仍然不起作用
#pragma acc parallel loop collapse(2) present(U, V) private(v[:NMAX_POINT][:NVAR])
for (k = kbeg; k <= kend; k++){
for (j = jbeg; j <= jend; j++){
#pragma acc loop collapse(2)
for (i = ibeg; i <= iend; i++) {
for (nv = 0; nv < NVAR; nv++){
v[i][nv] = V[nv][k][j][i];
}}
PrimToCons (v, U[k][j], ibeg, iend, g_gamma);
}}
我收到此错误:
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
就好像编译器找不到 v、U 或 V 但在主函数中我使用了这些指令:
#pragma acc enter data copyin(data)
#pragma acc enter data copyin(data.Vc[:NVAR][:NX3_TOT][:NX2_TOT][NX1_TOT], data.Uc[:NX3_TOT][:NX2_TOT][NX1_TOT][:NVAR])
data.Vc 和 data.Uc 是我要并行化的这个例程中的 V 和 U。
【问题讨论】:
标签: openacc