【发布时间】:2020-02-09 06:19:31
【问题描述】:
我用 avx-512 指令编写了 strlen 函数,这是我的源代码
size_t avx512_strlen(const char * s) {
__m512i vec0, vec1;
unsigned long long mask;
const char * ptr = s;
vec0 = _mm512_setzero_epi32();
while (1) {
vec1 = _mm512_loadu_si512(s);
mask = _mm512_cmpeq_epi8_mask(vec0, vec1);
if(mask != 0) {
mask = __builtin_ctz(mask);
return (s-ptr) + mask;
}
s += 64;
}
return s-ptr;
}
'__builtin_ctz(mask)'的值有问题,返回值不正确。事实上,这个函数不能计算空终止符(0x00)在last-check中的位置
例如,我有这个字符串
char str[] = "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
"EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
"EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
"EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";
这个字符串的长度是 (360) 但这个函数返回 (352) 问题来自 '__builtin_ctz' 部分。在执行 '__builtin_ctz' 之前,提供的掩码是正确的,并且是
0001110100010001000100010000000000000000000000000000000000000000
在最后一次检查中,我们检查了 320 个字符并且 __builtin_ctz 必须返回 (40)(正如您在掩码中看到的,我们将 40 个零计数到第一个 '1' 并且提供掩码正确并且 '__builtin_ctz' 计数错误!
有什么问题?
【问题讨论】:
标签: c gcc bit-manipulation intrinsics avx512