TL;DR - 似乎可以以线程安全的方式使用 C++11 静态变量初始化,它具有与 dispatch_once 相同的性能特征。
按照 Stephan Lechner 的回答,我编写了最简单的代码来测试 C++ 静态初始化流程:
class Object {
};
static Object *GetObjectCppStatic() {
static Object *object = new Object();
return object;
}
int main() {
GetObjectCppStatic();
}
通过clang++ test.cpp -O0 -fno-exceptions -S 将其编译为程序集(-O0 避免内联,为-Os 生成相同的通用代码,-fno-exceptions 以简化生成的代码),表明GetObjectCppStatic 编译为:
__ZL18GetObjectCppStaticv: ## @_ZL18GetObjectCppStaticv
.cfi_startproc
## BB#0:
pushq %rbp
Lcfi6:
.cfi_def_cfa_offset 16
Lcfi7:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Lcfi8:
.cfi_def_cfa_register %rbp
cmpb $0, __ZGVZL18GetObjectCppStaticvE6object(%rip)
jne LBB2_3
## BB#1:
leaq __ZGVZL18GetObjectCppStaticvE6object(%rip), %rdi
callq ___cxa_guard_acquire
cmpl $0, %eax
je LBB2_3
## BB#2:
movl $1, %eax
movl %eax, %edi
callq __Znwm
leaq __ZGVZL18GetObjectCppStaticvE6object(%rip), %rdi
movq %rax, __ZZL18GetObjectCppStaticvE6object(%rip)
callq ___cxa_guard_release
LBB2_3:
movq __ZZL18GetObjectCppStaticvE6object(%rip), %rax
popq %rbp
retq
.cfi_endproc
我们绝对可以看到由 libc++ ABI here 实现的___cxa_guard_acquire 和___cxa_guard_release。请注意,我们甚至不必向 clang 指定我们使用 C++11,因为显然这在此之前就已默认支持。
所以我们知道这两种形式都可以确保本地静态的线程安全初始化。但是性能呢?以下测试代码检查无争用(单线程)和严重争用(多线程)的两种方法:
#include <cstdio>
#include <dispatch/dispatch.h>
#include <mach/mach_time.h>
class Object {
};
static double Measure(int times, void(^executionBlock)(), void(^finallyBlock)()) {
struct mach_timebase_info timebaseInfo;
mach_timebase_info(&timebaseInfo);
uint64_t start = mach_absolute_time();
for (int i = 0; i < times; ++i) {
executionBlock();
}
finallyBlock();
uint64_t end = mach_absolute_time();
uint64_t timeTook = end - start;
return ((double)timeTook * timebaseInfo.numer / timebaseInfo.denom) /
NSEC_PER_SEC;
}
static Object *GetObjectDispatchOnce() {
static Object *object;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
object = new Object();
});
return object;
}
static Object *GetObjectCppStatic() {
static Object *object = new Object();
return object;
}
int main() {
printf("Single thread statistics:\n");
printf("DispatchOnce took %g\n", Measure(10000000, ^{
GetObjectDispatchOnce();
}, ^{}));
printf("CppStatic took %g\n", Measure(10000000, ^{
GetObjectCppStatic();
}, ^{}));
printf("\n");
dispatch_queue_t queue = dispatch_queue_create("queue",
DISPATCH_QUEUE_CONCURRENT);
dispatch_group_t group = dispatch_group_create();
printf("Multi thread statistics:\n");
printf("DispatchOnce took %g\n", Measure(1000000, ^{
dispatch_group_async(group, queue, ^{
GetObjectDispatchOnce();
});
}, ^{
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
}));
printf("CppStatic took %g\n", Measure(1000000, ^{
dispatch_group_async(group, queue, ^{
GetObjectCppStatic();
});
}, ^{
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
}));
}
在 x64 上产生以下结果:
Single thread statistics:
DispatchOnce took 0.025486
CppStatic took 0.0232348
Multi thread statistics:
DispatchOnce took 0.285058
CppStatic took 0.32596
所以到测量误差为止,似乎两种方法的性能特征是相似的,主要是由于它们都执行了double-check locking。对于dispatch_once,这发生在_dispatch_once 函数中:
void
_dispatch_once(dispatch_once_t *predicate,
DISPATCH_NOESCAPE dispatch_block_t block)
{
if (DISPATCH_EXPECT(*predicate, ~0l) != ~0l) {
// ...
} else {
// ...
}
}
在 C++ 静态初始化流程中,它发生在调用 ___cxa_guard_acquire 之前。