什么更快，'bool' 或整数类型？答案

【问题标题】：What is faster, 'bool' or an integer type?什么更快，'bool' 或整数类型？
【发布时间】：2022-01-18 19:45:39
【问题描述】：

在向广为人知的开源项目（以其性能和简单性而闻名）发送补丁时，我收到了一条让我有点惊讶的评论：'使用 C99 中的“bool”类型是个坏主意'。他们推理得很好，并向我展示了一个简单的示例程序，它表明（未优化的代码）在使用 bool 时明显比使用整数类型时有更多的指令。

所以他们基本上使用typedef unsigned int bool_t; 之类的东西，并确保他们只为该类型分配 1。

我想得到一个令人信服和明确的答案，也知道我们在谈论什么样的性能差异（即，值得吗？），看看编译器是否可以在启用优化的情况下做得更好。

有一个与这个问题非常相关的 C++ 问题，但是（除了 C++ 之外）这个问题仅限于选择语句，而在这个问题中，我关心bool 的两个方面：赋值和选择.那个相关的问题是Which is faster : if (bool) or if(int)?

那么，bool 或整数类型哪个更快？性能差异有多重要？

【问题讨论】：

bool（在 C 中是 _Bool 的别名）是整数类型。但它确实具有其他整数类型所没有的相关语义。
“性能差异有多重要？” - 这取决于代码在做什么。它是在程序的整个生命周期中发生一次，还是在紧密循环中发生数千次？如果是前者，那就不用担心了。如果是后者，它可以有所作为，但这种差异值得吗？正确性、清晰度和可维护性比原始速度更重要。话虽如此，如果他们已经有处理非bool 类型的布尔值的约定，那么使用他们的约定。
也相关：Boolean values as 8 bit in compilers. Are operations on them inefficient? - 在某些情况下编译器并不擅长，但没有一般规则。

标签： c gcc boolean clang micro-optimization

【解决方案1】：

2021-12-16 19:07 编辑：显示与 uint 和 uchar 的比较，同时显示 GCC 和 Clang。将-march=native 添加到编译器标志。现在的结果似乎表明bool 和其他整数类型一样好，但是一些编译器产生的代码不是最优的。

编辑于 2022-01-11 18:56：经过一些测试，稍微更改代码可能会显示重要的性能问题，_Bool 比 uint 更可能出现。 p>

对于我的测试，我选择了无符号类型，因为项目使用的是无符号类型而不是 bool，但我希望有符号类型的行为类似。

我将在此处展示使用 unsigned char 进行的测试，因为 bool 在我的系统中是 1 个字节，这减少了汇编输出的差异，并且还使用 unsigned int 比较不同的宽度。

我测试了将整数存储到其中一种类型（bool、unsigned char 和 unsigned int）中，使用其中一种类型来控制选择语句，并使用其中一种类型作为参数功能。

源代码：

// repeat.h:

#pragma once

#define repeat2(e)     (e);(e)
#define repeat4(e)     repeat2(e);repeat2(e)
#define repeat8(e)     repeat4(e);repeat4(e)
#define repeat16(e)    repeat8(e);repeat8(e)
#define repeat32(e)    repeat16(e);repeat16(e)
#define repeat64(e)    repeat32(e);repeat32(e)
#define repeat128(e)   repeat64(e);repeat64(e)
#define repeat256(e)   repeat128(e);repeat128(e)
#define repeat512(e)   repeat256(e);repeat256(e)
#define repeat1024(e)  repeat512(e);repeat512(e)

#define repeat(e)  do                           \
{                                   \
    repeat16(e);                            \
} while (0)

// store_bool.h:

#pragma once

_Bool store_bool(long n, int x);

// store_bool.c:

#include "store_bool.h"
#include "repeat.h"

_Bool store_bool(long n, volatile int x)
{
    volatile _Bool  b;

    for (long i = 0; i < n; i++)
        repeat(b = x);
    return b;
}

// store_uchar.h:

#pragma once

unsigned char store_uchar(long n, int x);

// store_uchar.c:

#include "store_uchar.h"
#include "repeat.h"

unsigned char store_uchar(long n, volatile int x)
{
    volatile unsigned char  c;

    for (long i = 0; i < n; i++)
        repeat(c = x);
    return c;
}

// store_uint.h:

#pragma once

unsigned int store_uint(long n, int x);

// store_uint.c:

#include "store_uint.h"
#include "repeat.h"

unsigned int store_uint(long n, volatile int x)
{
    volatile unsigned int  u;

    for (long i = 0; i < n; i++)
        repeat(u = x);
    return u;
}

// consume_bool.h:

#pragma once

int consume_bool(long n, _Bool b);

// consume_bool.c:

#include "consume_bool.h"
#include "repeat.h"

int consume_bool(long n, volatile _Bool b)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (b) x = 3;});
    return x;
}

// consume_uchar.h:

#pragma once

int consume_uchar(long n, unsigned char u);

// consume_uchar.c:

#include "consume_uchar.h"
#include "repeat.h"

int consume_uchar(long n, volatile unsigned char c)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (c) x = 3;});
    return x;
}

// consume_uint.h:

#pragma once

int consume_uint(long n, unsigned int u);

// consume_uint.c:

#include "consume_uint.h"
#include "repeat.h"

int consume_uint(long n, volatile unsigned int u)
{
    volatile int  x = 5;

    for (long i = 0; i < n; i++)
        repeat({if (u) x = 3;});
    return x;
}

// param_bool_.h:

#pragma once

int param_bool_(_Bool x);

// param_bool_.c:

#include "param_bool_.h"

int param_bool_(_Bool b)
{
    return b ? 3 : 5;
}

// param_bool.h:

#pragma once

void param_bool(long n, _Bool b);

// param_bool.c:

#include "param_bool.h"
#include "param_bool_.h"
#include "repeat.h"

void param_bool(long n, volatile _Bool b)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(b));
}

// param_uchar_.h:

#pragma once

int param_uchar_(unsigned char c);

// param_uchar_.c:

#include "param_uchar_.h"

int param_uchar_(unsigned char c)
{
    return c ? 3 : 5;
}

// param_uchar.h:

#pragma once

void param_uchar(long n, unsigned char c);

// param_uchar.c:

#include "param_uchar.h"
#include "param_uchar_.h"
#include "repeat.h"

void param_uchar(long n, volatile unsigned char c)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(c));
}

// param_uint_.h:

#pragma once

int param_uint_(unsigned int u);

// param_uint_.c:

#include "param_uint_.h"

int param_uint_(unsigned int u)
{
    return u ? 3 : 5;
}

// param_uint.h:

#pragma once

void param_uint(long n, unsigned int u);

// param_uint.c:

#include "param_uint.h"
#include "param_uint_.h"
#include "repeat.h"

void param_uint(long n, volatile unsigned int u)
{
    for (long i = 0; i < n; i++)
        repeat(param_bool_(u));
}

// main.c:

#include <stdio.h>
#include <time.h>

#include "store_bool.h"
#include "store_uchar.h"
#include "store_uint.h"
#include "consume_bool.h"
#include "consume_uchar.h"
#include "consume_uint.h"
#include "param_bool.h"
#include "param_uchar.h"
#include "param_uint.h"


#define measure(e)                          \
({                                  \
    clock_t  t0, t1;                        \
    double   t;                         \
                                    \
    t0 = clock();                           \
    e;                              \
    t1 = clock();                           \
                                    \
    t = (double) (t1 - t0) / CLOCKS_PER_SEC;            \
    t;                              \
})


int main(int argc, char *argv[])
{
    double  sb, sc, su;
    double  cb, cc, cu;
    double  pb, pc, pu;
    long    n;

    if (argc != 2)
        exit(2);
    n = atol(argv[1]);

    sb = measure(store_bool(n, 1));
    sc = measure(store_uchar(n, 1));
    su = measure(store_uint(n, 1));

    cb = measure(consume_bool(n, 1));
    cc = measure(consume_uchar(n, 1));
    cu = measure(consume_uint(n, 1));

    pb = measure(param_bool(n, 1));
    pc = measure(param_uchar(n, 1));
    pu = measure(param_uint(n, 1));

    printf("n: %li\n", n);
    putchar('\n');
    printf("store bool:    %lf\n", sb);
    printf("store uchar:   %lf\n", sc);
    printf("store uint:    %lf\n", su);
    putchar('\n');
    printf("consume bool:  %lf\n", cb);
    printf("consume uchar: %lf\n", cc);
    printf("consume uint:  %lf\n", cu);
    putchar('\n');
    printf("param bool:    %lf\n", pb);
    printf("param uchar:   %lf\n", pc);
    printf("param uint:    %lf\n", pu);
}

我对一些变量使用了volatile，以避免编译器优化出多个赋值和测试。

由于编译器不会展开循环，因为它们很大，我在每个循环中使用了许多 (16) 个重复表达式（请参阅 repeat() 宏），以减少循环开销（跳转指令）的影响总基准测试时间。

编译：

$ cc -Wall -Wextra -O3 -march=native -S *.c
$ cc -O3 -march=native *.s
$

组装：

为了简化，我将在 16 次重复中挑选一个。如果你想查看完整的汇编文件，你可以自己编译（我在这里给出了足够的说明）。

// store_bool.s (GCC):

    movl    -20(%rsp), %edx
    testl   %edx, %edx
    setne   %dl
    movb    %dl, -1(%rsp)

// store_bool.s (Clang):

    cmpl    $0, -4(%rsp)
    setne   -5(%rsp)

// sotre_uchar.s (GCC):

    movl    -20(%rsp), %edx
    movb    %dl, -1(%rsp)

// store_uchar.s (Clang):

    movl    -4(%rsp), %ecx
    movb    %cl, -5(%rsp)

// store_uint.s (GCC):

    movl    -20(%rsp), %edx
    movl    %edx, -4(%rsp)

// store_uint.s (Clang):

    movl    -4(%rsp), %ecx
    movl    %ecx, -8(%rsp)

从上面看，uchar 和 uint 很可能是相同的。 bool 在 Clang 上也有两条指令，但它们是不同的；这可能会或可能不会有所作为。在 GCC 上，与 uchar 相比，它显然有 2 个额外的指令，这使得它更慢。

// consume_bool.s (GCC):

    movzbl  -20(%rsp), %edx
    testb   %dl, %dl
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_bool.s (Clang):

.LBB0_5:                                #   in Loop: Header=BB0_1 Depth=1
    testb   $1, -5(%rsp)
    jne .LBB0_6

    [...]

.LBB0_6:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -4(%rsp)
    testb   $1, -5(%rsp)
    je  .LBB0_9

（LBB0_9 与 LBB0_5 类似）

// consume_uchar.s (GCC):

    movzbl  -20(%rsp), %edx
    testb   %dl, %dl
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_uchar.s (Clang):

    cmpb    $0, -5(%rsp)
    je  .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -4(%rsp)
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1

// consume_uint.s (GCC):

    movl    -20(%rsp), %edx
    testl   %edx, %edx
    je  .L2
    movl    $3, -4(%rsp)
.L2:

// consume_uint.s (Clang):

    cmpl    $0, -4(%rsp)
    je  .LBB0_3
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
    movl    $3, -8(%rsp)
.LBB0_3:                                #   in Loop: Header=BB0_1 Depth=1

在这些情况下，GCC 生成的程序集对于 3 种类型几乎相同，所以我不认为有任何区别。在 Clang 中，bool 有不同的代码，但由于它非常不同，因此很难预测它会比整数快还是慢。

// param_bool_.s (GCC):

param_bool_:
.LFB0:
    .cfi_startproc
    cmpb    $1, %dil
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_bool_.s (Clang):

param_bool_:                            # @param_bool_
    .cfi_startproc
# %bb.0:
    xorb    $1, %dil
    movzbl  %dil, %eax
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_bool.s (GCC):

    movzbl  12(%rsp), %edi
    call    param_bool_@PLT

// param_bool.s (Clang):

    movzbl  15(%rsp), %edi
    andl    $1, %edi
    callq   param_bool_

// param_uchar_.s (GCC):

param_uchar_:
.LFB0:
    .cfi_startproc
    cmpb    $1, %dil
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_uchar_.s (Clang):

param_uchar_:                           # @param_uchar_
    .cfi_startproc
# %bb.0:
    xorl    %eax, %eax
    testl   %edi, %edi
    sete    %al
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_uchar.s (GCC):

    movzbl  12(%rsp), %edi
    call    param_uchar_@PLT

// param_uchar.s (Clang):

    movzbl  15(%rsp), %edi
    callq   param_uchar_

// param_uint_.s (GCC):

param_uint_:
.LFB0:
    .cfi_startproc
    cmpl    $1, %edi
    sbbl    %eax, %eax
    andl    $2, %eax
    addl    $3, %eax
    ret
    .cfi_endproc
.LFE0:

// param_uint_.s (Clang):

param_uint_:                            # @param_uint_
    .cfi_startproc
# %bb.0:
    xorl    %eax, %eax
    testl   %edi, %edi
    sete    %al
    addl    %eax, %eax
    addl    $3, %eax
    retq
.Lfunc_end0:

// param_uint.s (GCC):

    movl    12(%rsp), %edi
    call    param_uint_@PLT

// param_uint.s (Clang):

    movl    12(%rsp), %edi
    callq   param_uint_

在这种情况下，bool 应该与 uchar 相同，因为唯一重要的应该是宽度，我们可能会看到（或看不到）与 uint 的差异。从零扩展的一部分，没有太大的区别。 GCC 和 Clang 之间存在细微差别，但是 Clang 生成的代码更大，所以我预计 Clang 的运行速度会比 GCC 稍慢。

时间：

// amd64, gcc-11, i5-5675C:

$ ./a.out 1073741824
store bool:    4.928789
store uchar:   4.795028
store uint:    4.803893

consume bool:  4.795776
consume uchar: 4.794873
consume uint:  4.794079

param bool:    17.713958
param uchar:   17.611229
param uint:    17.688909

// amd64, clang-13, i5-5675C:

$ ./a.out 1073741824
store bool:    4.806418
store uchar:   4.802943
store uint:    4.800172

consume bool:  4.805537
consume uchar: 4.799858
consume uint:  4.799462

param bool:    19.095543
param uchar:   17.708014
param uint:    17.782490

在“存储”中，正如我们预期的那样，bool 比使用 GCC 的其他类型慢（大约 1~10%）。使用 Clang，没有显着差异（我看到 bool 一直比其他人慢一点，但不到 0.5%）。

在“消费”中，我们看不到类型或编译器之间的区别。

在 'param' 中，运行之间的时间变化很大，并且没有一致性：有时 bool 更慢，有时更快。但是，GCC 始终比 Clang 快。

代码中的微小更改可能会导致编译器缺少重要的优化。在consume_<type>.c 中使用以下代码会导致一些重要的性能损失：

        repeat(x = b ? 3 : x);

请注意，仅将 if 更改为三元运算符，会使编译器减速到以下时间：

海合会：

$ ./a.out 1073741824
n: 1073741824

...

consume bool:  8.684662
consume uchar: 8.683915
consume uint:  8.086806

...

叮当声：

$ ./a.out 1073741824
n: 1073741824

...

consume bool:  8.161896
consume uchar: 5.422896
consume uint:  5.127165

...

_Bool 的 Clang 速度明显减慢，而其他类型的速度保持合理。 GCC 似乎为所有类型生成了非常糟糕的代码。

结论：

程序员应该考虑以下几点：

性能：尽管理论上 _Bool 可能与 unsigned int 一样快，但编译器远非理想，而且您的编译器很可能会错过一些优化，这在某些情况下可能非常重要。

可维护性/可读性/正确性：有些人可能认为_Bool 由于自动规范化而更安全；其他人可能会争辩说，由于自动归一化，它不太安全；只知道你在使用什么，并形成你自己的意见。

支持 C99 之前的代码：如果是这种情况，您别无选择，只能使用 unsigned int。

【讨论】：

我想知道使用volatile 的测量值有多大。不使用volatile的真实代码可能看起来会大不相同。
我同意 Ted 的观点，我认为这似乎是对货物的崇拜，而不是现实，因为对 _Bool 的要求非常宽松并且有利于性能。唯一真正的要求是，从抽象机器的角度来看，它只包含 1 或 0。允许编译器对它们执行大量“AS-IF”。
您的问题是他们使用typedef unsigned int bool_t; 并确保只分配1 或0 给他们，但根据定义，这意味着他们手动编写与bool 相同的代码为他们生成；无论如何，使用bool_t b = somenonboolinteger != 0; 最终会产生相同的testl + setne。在问题中使用typedef 表示unsigned int（与您的答案中的unsigned char 相比）意味着您所有的bools 在大多数系统上可能占用4 倍的内存（std::vector<bool_t> 的内存为32 倍，而std::vector<bool>，但 std::vector<bool> 存在性能问题）。
如果您想要清晰的代码，无论如何都不应该将非 bool 值分配给 bool。您最终总是会为结果分配一个比较（如step == 0 或pass < 5），它确实已经返回一个布尔值。所以在实践中没有分配开销。
即使某些自动规范化是“不必要的”，它们在现实世界代码中的百分比也将很好低于所有操作的 1%（基准使它们〜所有操作的 50%），因此微基准测试中 1-5% 的变化将转化为很好在任何现实世界代码中低于 0.02-0.1% 的变化。在省略规范化的情况？