有没有更精确的方法来拥有一个 cbrt()？答案

【问题标题】：Is there a more precise way to have a cbrt()?有没有更精确的方法来拥有一个 cbrt()？
【发布时间】：2022-11-01 13:18:14
【问题描述】：

我想知道 C99 的 cbrt() 函数是否是通过重定向到 pow( x, 1.0 / 3.0 ) 来实现的。所以我用 C++20 写了一个小基准测试：

#include <iostream>
#include <cmath>
#include <vector>
#include <random>
#include <chrono>
#include <atomic>
#include <functional>

using namespace std;
using namespace chrono;

atomic<double> aSum;

int main()
{
    constexpr size_t
        N = 1'000,
        ROUNDS = 10'000;
    vector<double> vd;
    vd.resize( N );
    mt19937_64 mt;
    uniform_real_distribution<double> urd( 0, numeric_limits<double>::max() );
    for( double &d : vd )
        d = urd( mt );
    auto bench = [&]<typename CbrtFn>( CbrtFn cbrtFn )
        requires requires( CbrtFn cbrtFn ) { { cbrtFn( 1.0 ) } -> same_as<double>; }
    {
        double sum = 0.0;
        auto start = high_resolution_clock::now();
        for( size_t r = ROUNDS; r--; )
            for( double d : vd )
                sum += cbrtFn( d );
        double ns = duration_cast<nanoseconds>(high_resolution_clock::now() - start).count() / ((double)N * ROUNDS);
        ::aSum = sum;
        cout << ns << endl;
    };
    bench( []( double d ) -> double { return cbrt( d ); } );
    bench( bind( []( double d, double e ) -> double { return pow( d, e ); }, placeholders::_1, 1.0 / 3.0 ) );
}

对于我在 Linux 下的 Phenom II 计算机，这两个函数的吞吐量几乎相同，但在我的 Windows 机器上使用当前的 MSVC，pow()'ed 函数所需的时间减少了大约 40%。所以我问自己是否有比 pow() 更精确的 cbrt() 方法。 pow() 执行 d ^ (2 ^ N) 的一行乘法 - 只有负 Ns 与 1.0 / 3.0。

【问题讨论】：

标签： math

【解决方案1】：

检查装配。鉴于这种：

float func(float f) {
    return std::pow(f, 1.0f / 3.0f);
}

铿锵产生：

func(float):                               # @func(float)
        jmp     cbrtf@PLT                       # TAILCALL

msvc 产生：

float func(float) PROC                                  ; func, COMDAT
        movss   xmm0, DWORD PTR _f$[esp-4]
        movss   xmm1, DWORD PTR __real@3eaaaaab
        call    ___libm_sse2_powf
        movss   DWORD PTR tv71[esp-4], xmm0
        fld     DWORD PTR tv71[esp-4]
        ret     0

将代码更改为：

#include <cmath> 

float func(float f) {
    return std::cbrt(f);
}

产生：

float func(float) PROC                                  ; func, COMDAT
        movss   xmm0, DWORD PTR _f$[esp-4]
        push    ecx
        movss   DWORD PTR [esp], xmm0
        call    _cbrtf
        add     esp, 4
        ret     0

但是，是的，如果可用，请致电 cbrt。 pow 通常归结为 log2 + exp2 调用。 cbrt 的性能要好得多。一般来说，避免对 pow 的无意义调用是一件好事....

【讨论】：

这不是我要求的。由于精度原因，我问过是否有比重定向到 pow() 更好的方法来执行 cbrt()。这将是我 cbrt() 的 MSVC 实现比 pow(d, 1.0 / 3.0) 慢的唯一原因。正如我所展示的，pow() 解决方案在我的 Windows-PC 上花费的时间减少了 40%；所以这不是“更好的表现”。