另一个 slow 版本,用于 uint32:
void str2uint_aux(unsigned& number, unsigned& overflowCtrl, const char*& ch)
{
unsigned digit = *ch - '0';
++ch;
number = number * 10 + digit;
unsigned overflow = (digit + (256 - 10)) >> 8;
// if digit < 10 then overflow == 0
overflowCtrl += overflow;
}
unsigned str2uint(const char* s, size_t n)
{
unsigned number = 0;
unsigned overflowCtrl = 0;
// for VC++10 the Duff's device is faster than loop
switch (n)
{
default:
throw std::invalid_argument(__FUNCTION__ " : `n' too big");
case 10: str2uint_aux(number, overflowCtrl, s);
case 9: str2uint_aux(number, overflowCtrl, s);
case 8: str2uint_aux(number, overflowCtrl, s);
case 7: str2uint_aux(number, overflowCtrl, s);
case 6: str2uint_aux(number, overflowCtrl, s);
case 5: str2uint_aux(number, overflowCtrl, s);
case 4: str2uint_aux(number, overflowCtrl, s);
case 3: str2uint_aux(number, overflowCtrl, s);
case 2: str2uint_aux(number, overflowCtrl, s);
case 1: str2uint_aux(number, overflowCtrl, s);
}
// here we can check that all chars were digits
if (overflowCtrl != 0)
throw std::invalid_argument(__FUNCTION__ " : `s' is not a number");
return number;
}
为什么这么慢?因为它一个接一个地处理字符。如果我们保证可以访问高达s+16 的字节,我们可以对*ch - '0' 和digit + 246 使用矢量化。
就像在这段代码中一样:
uint32_t digitsPack = *(uint32_t*)s - '0000';
overflowCtrl |= digitsPack | (digitsPack + 0x06060606); // if one byte is not in range [0;10), high nibble will be non-zero
number = number * 10 + (digitsPack >> 24) & 0xFF;
number = number * 10 + (digitsPack >> 16) & 0xFF;
number = number * 10 + (digitsPack >> 8) & 0xFF;
number = number * 10 + digitsPack & 0xFF;
s += 4;
范围检查的小更新:
第一个 sn-p 在每次迭代时都有冗余移位(或mov),所以它应该是
unsigned digit = *s - '0';
overflowCtrl |= (digit + 256 - 10);
...
if (overflowCtrl >> 8 != 0) throw ...