boost transform_iterator 和counting_iterator 的性能问题答案

【问题标题】：Performance issue with boost transform_iterator and counting_iteratorboost transform_iterator 和counting_iterator 的性能问题
【发布时间】：2016-07-07 04:09:51
【问题描述】：

我目前正在尝试对执行任意作业的大型循环的各种实现进行基准测试，但在使用 boost 变换迭代器和 boostcounting_iterators 时，我发现自己的版本非常慢。

我设计了一个小代码，用于对两个循环进行基准测试，该循环将 0 到 SIZE-1 之间的所有整数的乘积与任意整数（我在示例中选择为 1 以避免溢出）相加。

她是我的代码：

//STL
#include <iostream>
#include <algorithm>
#include <functional>
#include <chrono>

//Boost
#include <boost/iterator/transform_iterator.hpp>
#include <boost/iterator/counting_iterator.hpp>

//Compile using
// g++ ./main.cpp -o test -std=c++11

//Launch using
// ./test 1

#define NRUN 10
#define SIZE 128*1024*1024

struct MultiplyByN
{
    MultiplyByN( size_t N ): m_N(N){};
    size_t operator()(int i) const { return i*m_N; }
    const size_t m_N;
};

int main(int argc, char* argv[] )
{
    int N = std::stoi( argv[1] );
    size_t sum = 0;
    //Initialize chrono helpers
    auto start = std::chrono::steady_clock::now();
    auto stop = std::chrono::steady_clock::now();
    auto diff = stop - start;
    double msec=std::numeric_limits<double>::max(); //Set min runtime to ridiculously high value
    MultiplyByN op(N);


    //Perform multiple run in order to get minimal runtime
    for(int k = 0; k< NRUN; k++)
    {
        sum = 0;
        start = std::chrono::steady_clock::now();
        for(int i=0;i<SIZE;i++)
        {
            sum += op(i);
        }
        stop = std::chrono::steady_clock::now();
        diff = stop - start;
        //Compute minimum runtime
        msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
    }
    std::cout << "First version : Sum of values is "<< sum << std::endl;
    std::cout << "First version : Minimal Runtime was "<< msec << " msec "<< std::endl;
    msec=std::numeric_limits<double>::max(); //Reset min runtime to ridiculously high value

    //Perform multiple run in order to get minimal runtime
    for(int k = 0; k< NRUN; k++)
    {
        start = std::chrono::steady_clock::now();

        //Functional way to express the summation
        sum = std::accumulate(  boost::make_transform_iterator(boost::make_counting_iterator(0), op ),
                        boost::make_transform_iterator(boost::make_counting_iterator(SIZE), op ),
                        (size_t)0, std::plus<size_t>() );

        stop = std::chrono::steady_clock::now();
        diff = stop - start;
        //Compute minimum runtime
        msec = std::min( msec, std::chrono::duration<double, std::milli>(diff).count() );
    }
    std::cout << "Second version : Sum of values is "<< sum << std::endl;
    std::cout << "Second version version : Minimal Runtime was "<< msec << " msec "<< std::endl;
    return EXIT_SUCCESS;
}

我得到的输出：

./test 1
First version : Sum of values is 9007199187632128
First version : Minimal Runtime was 433.142 msec 
Second version : Sum of values is 9007199187632128
Second version version : Minimal Runtime was 10910.7 msec

使用 std::accumulate 的循环的“功能”版本比简单循环版本慢 25 倍，为什么会这样？

提前感谢您的帮助

【问题讨论】：

您是否在启用编译器优化的情况下进行编译？（gcc 和 clang 为 -O2，MSCV 为 Release build）否则结果毫无意义。
我确实尝试过使用带有 boost 1.60 和“-o2”的 gcc (C++14)，第二个版本每次运行时都会快一点......（121 毫秒和 118 毫秒）。 ..

标签： c++ performance boost functional-programming

【解决方案1】：

根据您在代码中的注释，您已将其编译为

g++ ./main.cpp -o test -std=c++11

由于您没有指定优化级别，g++ 使用默认设置，即-O0，即没有优化。

这意味着编译器没有内联任何东西。标准库或 boost 之类的模板库依赖于内联来提高性能。此外，编译器会产生大量额外的代码，这远非最佳——在此类二进制文件上进行性能比较没有任何意义。

在启用优化的情况下重新编译，然后再次尝试测试以获得有意义的结果。

【讨论】：

使用 -O3 标志的性能确实要好得多，使用此标志的两种方法我都得到了大约 100 毫秒。谢谢