【发布时间】:2020-06-24 01:30:04
【问题描述】:
我是 OpenCL 新手,我正在努力加快我的应用程序。 OpenCL 内核比使用顺序方法花费更多的时间。我正在尝试加密 4096 x 4096 图像。这是我写的内核:
__kernel void image_XOR(
__constant const unsigned int *inputImage,
__global unsigned int *outputImage,
__constant double *serpentineR,
__constant double *nonce,
__global unsigned int *signature) {
unsigned int i = get_global_id(0);
double decimalsPwr = pow(10.0, 15.0), serpentine2Pwr = pow(2.0, (*serpentineR));
unsigned int aux;
unsigned long long XORseq;
unsigned int decimals = floor(decimalsPwr * fabs(*nonce));
XORseq = decimals ^ (unsigned long long) floor(( 1.0 / (i + 1)) * decimalsPwr);
if (i % 2 == 1) {
aux = floor(decimalsPwr * fabs( atan( 1.0 / tan( decimalsPwr * (double) XORseq))));
} else {
aux = floor(decimalsPwr * fabs(sin(serpentine2Pwr * (double)XORseq) * cos(serpentine2Pwr * (double)XORseq)));
}
aux = aux << 8u; // comment if alfa chanel should be crypted as well
aux = aux >> 8u;
outputImage[i] = inputImage[i] ^ aux;
*signature = *signature ^ inputImage[i] ^ aux;}
注意:如果我注释掉这些行代码会快很多(0.5s 从 4s)
if (i % 2 == 1) {
aux = floor(decimalsPwr * fabs( atan( 1.0 / tan( decimalsPwr * (double) XORseq))));
} else {
aux = floor(decimalsPwr * fabs(sin(serpentine2Pwr * (double)XORseq) * cos(serpentine2Pwr * (double)XORseq)));
}
【问题讨论】:
标签: c encryption parallel-processing opencl