【发布时间】:2017-05-30 15:48:50
【问题描述】:
当我在顶部循环中有一个内核时,为什么我不能使用这两个指令:
#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])
#pragma acc update device(vbias[0:n_visible)
我需要在下面的代码中更新这些变量hbias、vbias、W,但它不起作用:
void RBM::contrastive_divergence(int train_X[6][6], double learning_rate, int k) {
double r= rand() / (RAND_MAX + 1.0);
int * input = new int[n_visible];
double *ph_mean = new double[n_hidden];
int *ph_sample = new int[n_hidden];
double *nv_means = new double[n_visible];
int *nv_samples = new int[n_visible];
double *nh_means = new double[n_hidden];
int *nh_samples = new int[n_hidden];
#pragma acc kernels
for (int i = 0; i<train_N; i++) {
for (int j = 0; j< n_visible; j++){
input[j] = train_X[i][j];
}
sample_h_given_v(input, ph_mean, ph_sample,r);
for (int step = 0; step<k; step++) {
if (step == 0) {
gibbs_hvh(ph_sample, nv_means, nv_samples, nh_means, nh_samples,r);
}
else {
gibbs_hvh(nh_samples, nv_means, nv_samples, nh_means, nh_samples,r);
}
}
for (int i = 0; i<n_hidden; i++) {
for (int j = 0; j<n_visible; j++) {
W[i][j] += learning_rate * (ph_mean[i] * input[j] - nh_means[i] * nv_samples[j]) / N;
}
hbias[i] += learning_rate * (ph_sample[i] - nh_means[i]) / N;
}
//this directive
#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])
for (int i = 0; i<n_visible; i++) {
vbias[i] += learning_rate * (input[i] - nv_samples[i]) / N;
}
//and this directive
#pragma acc update device(vbias[0:n_visible)
}
delete[] input;
delete[] ph_mean;
delete[] ph_sample;
delete[] nv_means;
delete[] nv_samples;
delete[] nh_means;
delete[] nh_samples;
}
但是当我在每个嵌套循环上有许多分离的内核时,我可以更新变量:
void RBM::contrastive_divergence(int train_X[6][6], double learning_rate, int k) {
double r= rand() / (RAND_MAX + 1.0);
int * input = new int[n_visible];
double *ph_mean = new double[n_hidden];
int *ph_sample = new int[n_hidden];
double *nv_means = new double[n_visible];
int *nv_samples = new int[n_visible];
double *nh_means = new double[n_hidden];
int *nh_samples = new int[n_hidden];
for (int i = 0; i<train_N; i++) {
#pragma acc kernels
for (int j = 0; j< n_visible; j++){
input[j] = train_X[i][j];
}
sample_h_given_v(input, ph_mean, ph_sample,r);
#pragma acc kernels
for (int step = 0; step<k; step++) {
if (step == 0) {
gibbs_hvh(ph_sample, nv_means, nv_samples, nh_means, nh_samples,r);
}
else {
gibbs_hvh(nh_samples, nv_means, nv_samples, nh_means, nh_samples,r);
}
}
#pragma acc kernels
{
for (int i = 0; i<unhidden; i++) {
for (int j = 0; j<n_visible; j++) {
W[i][j] += learning_rate * (ph_mean[i] * input[j] - nh_means[i] * nv_samples[j]) / N;
}
hbias[i] += learning_rate * (ph_sample[i] - nh_means[i]) / N;
}
//this directive
#pragma acc update device(hbias[0:n_hidden],W[0:n_hidden][0:n_visible])
}
#pragma acc kernels
{
for (int i = 0; i<n_visible; i++) {
vbias[i] += learning_rate * (input[i] - nv_samples[i]) / N;
}
//and this directive
#pragma acc update device(vbias[0:n_visible)
}
}
delete[] input;
delete[] ph_mean;
delete[] ph_sample;
delete[] nv_means;
delete[] nv_samples;
delete[] nh_means;
delete[] nh_samples;
}
【问题讨论】:
-
你用的是什么编译器?如果是 PGI,您能否发布 -Minfo=accel 的输出?看起来这应该可行。如果您在内核之外立即添加一个数据区域会怎样?这应该不是必需的,但可能会有所帮助。
-
是的,我使用 PGI 编译器。基本上,我需要对一些变量进行归约操作。但它也没有被编译器接受。我需要为每次完成的迭代同步一些变量值。否则,结果将不正确。我将尝试添加一个数据区域指令,看看我会得到什么。谢谢
-
我使用了这个命令 $ pgc++ - fast - acc - ta = tesla:managed - Minfo = accel - o task2 。 /RBM.cpp && echo "编译成功!"在没有任何附加指令的内核上,输出如下:
-
如果您使用 -ta=tesla:managed 则更新指令将被忽略,并且数据移动将作为来自 CUDA 驱动程序的数据迁移触发。
标签: c++ parallel-processing directive openacc pgi