【问题标题】:gcc won't vectorize simple loopgcc 不会矢量化简单循环
【发布时间】:2016-06-03 23:14:33
【问题描述】:

我正在尝试从 gcc auto-vectorize documentation 向量化示例 4 的简化版本。对于我的生活,我不知道该怎么做;

typedef int aint __attribute__ ((__aligned__(16)));
void foo1 (int n, aint * restrict px, aint *restrict qx) {

  /* feature: support for (aligned) pointer accesses.  */
  int *__restrict p = __builtin_assume_aligned (px, 16);
  int *__restrict q = __builtin_assume_aligned (qx, 16);

  while (n--){
    //*p++ += *q++; <- this is vectorized                                                                                                                                                                   
    p[n] += q[n]; // This isn't!                                                                                                                                                                            
  }
}

我正在运行 gcc 4.7.2 gcc -o 应用程序/craft_dbsplit.o -c -Wall -g -ggdb -O3 -msse2 -funsafe-math-optimizations -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=5 -funsafe-loop-optimizations -std =c99

它会回复:

Analyzing loop at apps/craft_dbsplit.c:388

388: dependence distance  = 0.
388: dependence distance == 0 between *D.9363_14 and *D.9363_14
388: dependence distance  = 0.
388: accesses have the same alignment.
388: dependence distance modulo vf == 0 between *D.9363_14 and *D.9363_14
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: vect_model_store_cost: unaligned supported by hardware.
388: vect_get_data_access_cost: inside_cost = 2, outside_cost = 0.
388: Alignment of access forced using peeling.
388: Vectorizing an unaligned access.
388: vect_model_load_cost: aligned.
388: vect_model_load_cost: inside_cost = 1, outside_cost = 0 .
388: vect_model_load_cost: unaligned supported by hardware.
388: vect_model_load_cost: inside_cost = 2, outside_cost = 0 .
388: vect_model_simple_cost: inside_cost = 1, outside_cost = 0 .
388: not vectorized: relevant stmt not supported: *D.9363_14 = D.9367_20;

apps/craft_dbsplit.c:382: note: vectorized 0 loops in function.

【问题讨论】:

  • "我正在运行 gcc 4.7.2" 您可能需要更新它,它已经很旧了。较新的版本确实对循环进行了矢量化。
  • 虽然gcc内置的vector很烂但是你可以试试
  • 对于它的价值:给定while(n--),那么*p++ += *q++; 不等于p[n] += q[n];。第二个版本向后迭代。

标签: c gcc auto-vectorization


【解决方案1】:

循环从高地址运行到低地址。您的 gcc 将向量操作视为从低地址运行到高地址,因此没有意识到它可以向量化。您的“优化”使循环成为while (n--),实际上是在阻止更相关的优化。试试

#include <stddef.h>

void foo1 (size_t n, int *restrict px, int const *restrict qx)
{
  int *restrict p = __builtin_assume_aligned(px, 16);
  int const *restrict q = __builtin_assume_aligned(qx, 16);
  size_t i = 0;
  while (i < n)
    {
      p[i] += q[i];
      i++;
    }
}

【讨论】:

  • 为什么不做一个不模糊的循环:for (size_t i=0; i&lt;n; i++).
  • @Lundin:给猫剥皮的方法很多。
  • 是的,你可以用一把猫剥皮刀,这把猫剥皮刀是众所周知的,而且所有剥皮者都能立即认出,或者你也可以用别的东西:)
  • 伦丁,EOF,感谢您的意见。我尝试了for 循环,gcc 说:not vectorized: unsupported data-type。我注意到while (--n)gcc auto-vectorization documentation 示例3 中的模式。我怀疑古代版本是罪魁祸首。我会尝试买一个新的,看看会发生什么。
  • @freddofrog:如果您真的仔细阅读该示例,您会注意到它不会向后迭代。它不是从p[n-1] 开始并以p[0] 结束,而是从p[0] 开始并以p[n-1] 结束。你看到区别了吗?
猜你喜欢
  • 2011-06-28
  • 2014-01-10
  • 2011-12-29
  • 2018-07-23
  • 2012-09-24
  • 2019-04-12
  • 2020-12-11
  • 2019-04-05
  • 2020-10-02
相关资源
最近更新 更多