在现代 x86-64 上计算 64 位整数的整数 Log10 的最快方法是什么？答案

【问题标题】：What is the fastest method of calculating integer Log10 for 64-bit integers on modern x86-64?在现代 x86-64 上计算 64 位整数的整数 Log10 的最快方法是什么？
【发布时间】：2019-03-06 22:10:25
【问题描述】：

标题；我发现了很多 32 位示例，但没有完整的 64 位示例。以this post 为指导，我想出了Log10 的以下实现，但我不完全确定翻译是否准确或高效...

编辑：据推测，this Clang example 处理 MAX_VALUE 的情况没有最后两条指令，但如果删除，我得到的结果是 20 而不是预期的 19。

...
mov rcx, 0FFFFFFFFFFFFFFFFh               ; put the integer to be tested into rcx

lea r10, qword ptr powersOfTen            ; put pointer to powersOfTen array into r10
lea r9, qword ptr maxDigits               ; put pointer to maxDigits array into r9
bsr rax, rcx                              ; put log2 of rcx into rax
cmovz rax, rcx                            ; if rcx is zero, put zero into rax
mov al, byte ptr [(r9 + rax)]             ; index into maxDigits array using rax; put the result into al
cmp rcx, qword ptr [(r10 + (rax * 8))]    ; index into powersOfTen array using (rax * 8); compare rcx with the result
sbb al, 0h                                ; if the previous operation resulted in a carry, subtract 1 from al
add rcx, 1h                               ; add one to rcx
sbb al, 0h                                ; if the previous operation resulted in a carry, subtract 1 from al
...

align 2

maxDigits:
    byte 00h
    byte 00h
    byte 00h
    byte 01h
    byte 01h
    byte 01h
    byte 02h
    byte 02h
    byte 02h
    byte 03h
    byte 03h
    byte 03h
    byte 03h
    byte 04h
    byte 04h
    byte 04h
    byte 05h
    byte 05h
    byte 05h
    byte 06h
    byte 06h
    byte 06h
    byte 06h
    byte 07h
    byte 07h
    byte 07h
    byte 08h
    byte 08h
    byte 08h
    byte 09h
    byte 09h
    byte 09h
    byte 09h
    byte 0Ah
    byte 0Ah
    byte 0Ah
    byte 0Bh
    byte 0Bh
    byte 0Bh
    byte 0Ch
    byte 0Ch
    byte 0Ch
    byte 0Ch
    byte 0Dh
    byte 0Dh
    byte 0Dh
    byte 0Eh
    byte 0Eh
    byte 0Eh
    byte 0Fh
    byte 0Fh
    byte 0Fh
    byte 0Fh
    byte 11h
    byte 11h
    byte 11h
    byte 12h
    byte 12h
    byte 12h
    byte 13h
    byte 13h
    byte 13h
    byte 13h
    byte 14h

align 2

powersOfTen:
    qword 00000000000000001h
    qword 0000000000000000Ah
    qword 00000000000000064h
    qword 000000000000003E8h
    qword 00000000000002710h
    qword 000000000000186A0h
    qword 000000000000F4240h
    qword 00000000000989680h
    qword 00000000005F5E100h
    qword 0000000003B9ACA00h
    qword 000000002540BE400h
    qword 0000000174876E800h
    qword 0000000E8D4A51000h
    qword 0000009184E72A000h
    qword 000005AF3107A4000h
    qword 000038D7EA4C68000h
    qword 0002386F26FC10000h
    qword 0016345785D8A0000h
    qword 00DE0B6B3A7640000h
    qword 08AC7230489E80000h
    qword 0FFFFFFFFFFFFFFFFh

【问题讨论】：

您更关心英特尔还是 AMD，还是在寻找可以在两者上高效运行的产品？您的函数是否会在循环中重复调用，以便表可以在缓存中保持热状态？（IDK，如果有任何替代表的好方法）
@PeterCordes 尽可能不可知论者会很好；函数通常会在循环中调用，但有时可能仅用作某些变量初始化的一部分。
mov al, [mem] 在大多数 CPU 上比 movzx 差。在您改编的答案上建议mov al 的评论是错误的。我离开了a correction。
@PeterCordes 感谢您提供的信息，帮助很大；我实际上选择了mov al, [mem]，因为这是我可以编译的唯一方法>.<.>

标签： math assembly optimization x86-64 masm

【解决方案1】：

为任意输入计算 log10 的最快方法是基于前导零计数（log2 近似值）的表查找，然后根据记录 10 的幂的第二个表进行可能的调整，该表落在log2 近似值的范围。

这正是你找到的 over here，所以我认为你很高兴。如果您了解 32 位版本，对 64 位的扩展很简单，只需将所有表大小加倍并用正确的值填充它们，并更改一些指令以使用 64 位寄存器和 64 位加载。

【讨论】：