【问题标题】:Unrolling nested loops c++展开嵌套循环c ++
【发布时间】:2020-07-27 23:21:18
【问题描述】:

我正在尝试展开一个嵌套循环,该循环将数据存储在 C++ 中的二维动态内存分配中。虽然,我不太确定该怎么做。这是展开前的原始循环:

int steps[1]; 
Ipp32f* vectx = ippiMalloc_32f_C1(size0, size1, &(steps[0])); 

for (int i = 0; i < size0; i++){
    for (int j = 0; j < size1; j++){
        Ipp32f* pointer = (Ipp32f*)((Ipp8u*)vectx + steps[0]*j + sizeof(Ipp32f)*i); 
        *pointer = datax[i]; 
    }
}

datax 在我的程序中是一个值,size0 = 30 和 size1 = 10000 的数组。 我尝试了以下方法,但不幸的是每个位置的值都不相同。有人可以帮我吗?

for (int i = 0; i < size0; i+=4) {
     for (int j = 0; j < size1; j+=4) {
        *((Ipp32f*)((Ipp8u*)vectx+ (steps[0] * j +0)+ (sizeof(Ipp32f) * i ))) = datax[i];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 1) + (sizeof(Ipp32f) * i ))) = datax[i ];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 2) + (sizeof(Ipp32f) * i ))) = datax[i ];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 3) + (sizeof(Ipp32f) * i ))) = datax[i ];
     }
     for (int j = 0; j < size1; j += 4) {
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 0) + (sizeof(Ipp32f) * i+1))) = datax[i+1];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 1) + (sizeof(Ipp32f) * i+1))) = datax[i+1];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 2) + (sizeof(Ipp32f) * i+1))) = datax[i+1];
        *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 3) + (sizeof(Ipp32f) * i+1))) = datax[i+1];
     }

     for (int j = 0; j < size1; j += 4) {
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 0) + (sizeof(Ipp32f) * i + 2))) = datax[i + 2];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 1) + (sizeof(Ipp32f) * i + 2))) = datax[i + 2];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 2) + (sizeof(Ipp32f) * i + 2))) = datax[i + 2];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 3) + (sizeof(Ipp32f) * i + 2))) = datax[i + 2];
    }
    for (int j = 0; j < size1; j += 4) {
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 0) + (sizeof(Ipp32f) * i + 3))) = datax[i + 3];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 1) + (sizeof(Ipp32f) * i + 3))) = datax[i + 3];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 2) + (sizeof(Ipp32f) * i + 3))) = datax[i + 3];
         *((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 3) + (sizeof(Ipp32f) * i + 3))) = datax[i + 3];
    }

} 

【问题讨论】:

  • 从 ssteps[0] 来看,代码甚至没有编译。

标签: c++ intel-ipp loop-unrolling


【解决方案1】:

您没有考虑帐户运算符优先级

*((Ipp32f*)((Ipp8u*)vectx + (steps[0] * j + 1) + (sizeof(Ipp32f) * i+1))) = datax[i+1];
                                        ^^^^^^--here               ^^^--and here

你应该添加()

*((Ipp32f*)((Ipp8u*)vectx + (steps[0] * (j + 1)) + (sizeof(Ipp32f) * (i+1)))) = datax[i+1];
                                        ^^^^^^                        ^^^

显然你应该在所有行中都这样做

顺便说一句,size0 = 30,如果将循环展开 4 x 4,那么在第一个循环的最后一次迭代期间,您将超出界限,您应该使用 size0 的倍数,例如 5 或 6

【讨论】:

  • @Melissa 请阅读我刚刚在底部添加的注释
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-01-17
  • 2016-12-03
  • 1970-01-01
  • 2017-01-26
相关资源
最近更新 更多