为什么函数式代码比 C 中的命令式代码快得多？答案

【问题标题】：Why is functional-styled code so much faster than imperative code in C?为什么函数式代码比 C 中的命令式代码快得多？
【发布时间】：2020-10-15 19:31:21
【问题描述】：

我一直在研究一种算法，用于计算各种语言中表达式的最大深度（即有多少嵌套括号），只是为了好玩/练习。

我注意到函数式 C 代码和命令式 C 代码的性能存在巨大差异，我想知道这是为什么。

给定字符串“(1+(2*3)+((8)/4))+1”，命令式代码在大约 10-13us 内始终如一地完成，但功能性代码需要 2-3us，超过两倍快速地。这两种算法都是用-O2和gcc编译的，所以我觉得这非常令人惊讶，但我对编译器的实现了解不够，无法理解为什么。

谁能告诉我为什么函数式代码的速度如此之快？

功能代码（注意 _ERR 只是带有整数的#define）：

const int max_depth_functional(
        const char *expr, const int sum, const int max) {
    switch(*expr) {
        case '\0':
            return sum == 0 ? max : UNTERM_PARENTH_ERR;
        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
        case '+': case '-': case '*': case '/': case '^':
            return max_depth_functional(expr + 1, sum, max);
        case '(':
            return max_depth_functional(
                expr + 1, sum + 1, sum + 1 > max ? sum + 1 : max
            );
        case ')':
            return max_depth_functional(expr + 1, sum - 1, max);
        default:
            return INVALID_EXPR_ERR;
    }
}

命令式代码：

const int max_depth_imperative(const char *expr) {
    int curr_sum = 0, curr_max = 0;
    while(*expr != '\0') {
        switch(*expr++) {
            case '0': case '1': case '2': case '3': case '4':
            case '5': case '6': case '7': case '8': case '9':
            case '+': case '-': case '*': case '/': case '^':
                break;
            case '(':
                curr_sum++;
                curr_max = curr_sum > curr_max ? curr_sum : curr_max;
                break;
            case ')':
                curr_sum--;
                break;
            default:
                return INVALID_EXPR_ERR;
        }
    }
    return curr_sum == 0 ? curr_max : UNTERM_PARENTH_ERR;
}

两者都被称为：

const clock_t start = clock();
const int func_result = max_depth_func(args[1]);
const clock_t end = clock();

另外，我正在使用 Linux x86_64 构建和运行

【问题讨论】：

0) 这并不重要 1) 引用的局部性和缓存垃圾 x) 将程序概括为也接受 {} 和 []。 xx) 甚至可以识别 "" 和 '' 字符串？
Const 不会向按值返回的任何内容添加任何内容。还有watch this talk
单次运行时间测量确实没有代表性。运行数千次（在不同的输入上）并计算时间。
另外：检查生成的代码。可能（怀疑）尾递归被检测到并被删除。 [请不要使用微秒基准，或重复运行它们]
1) 将您的具体示例过度概括为“命令式与功能式风格”是完全错误。 2) Per Eugene Sh：单次运行确实没有代表性。 3) 如果您在多次运行时仍然看到显着差异，那么：a) 尝试不同的编译器（例如 gcc 与 MSVS），b) 尝试不同的优化级别（例如 -Oo 与 -O3）和 c) 生成汇编输出（例如gcc/-S 或 MSVS/Fa)

标签： c performance functional-programming imperative-programming

【解决方案1】：

根据 cmets，我使用以下代码运行代码：

double imperative_time_sum = 0, functional_sum_time = 0;
for(int i = 0; i < 100000; i++) {
    const clock_t start_imp = clock();
    max_depth(args[1]);
    const clock_t end_imp = clock();
    max_depth_functional_fast(args[1], 0, 0);
    const clock_t end_func = clock();

    imperative_time_sum +=
        1000 * (double) (end_imp - start_imp) / CLOCKS_PER_SEC;
    functional_sum_time +=
        1000 * (double) (end_func - end_imp) / CLOCKS_PER_SEC;
}
printf("Average imperative: %fms\n", imperative_time_sum / 100000);
printf("Average functional: %fms\n", functional_sum_time / 100000);

产生的结果：

Average imperative: 0.002412ms
Average functional: 0.002421ms

虽然我之前重新运行了该程序超过 100 次，但我没有运行接近 100000 次。在那之后，时代已经很接近了。

【讨论】：

我会在两天后接受这个答案。