函数的运行时间短 - 为什么后续线程比第一个线程需要更多时间？答案

【问题标题】：Short runtime of a function - Why do subsequent threads need more time than the first thread?函数的运行时间短 - 为什么后续线程比第一个线程需要更多时间？
【发布时间】：2025-12-03 09:55:02
【问题描述】：

我在一台 8 核的 Windows PC 上运行了以下程序。函数（“FUNCTION”）的运行时间非常短（几百微秒）。但是，线程 3 所需的运行时间通常比线程 1 长 1.5 倍左右。对此有何解释？

using namespace std::chrono; 
       
    
double FUNCTION(){

    double result=0.0;
    for(int i=0; i<20'000; i++){
        result=result+sqrt(i);
    }
    return result;
}

int main()
{
    auto start1 = steady_clock::now();
    auto start2 = steady_clock::now();
    auto start3 = steady_clock::now();
   
    auto thread1= std::async( std::launch::async, FUNCTION);
    auto thread2= std::async( std::launch::async, FUNCTION);
    auto thread3= std::async( std::launch::async, FUNCTION);


    double res1 = thread1.get();
    auto stop1 = steady_clock::now();    
    double res2 = thread2.get(); 
    auto stop2 = steady_clock::now();    
    double res3 = thread3.get(); 
    auto stop3 = steady_clock::now();    


    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1); 
    std::cout << "Duration Thread 1:  "<<duration1.count() << std::endl;

    auto duration2 = duration_cast<microseconds>(stop2 - start2); 
    std::cout << "Duration Thread 2:  "<<duration2.count()<<std::endl;

    auto duration3 = duration_cast<microseconds>(stop3 - start3); 
    std::cout << "Duration Thread 3:  "<<duration3.count()<<std::endl;

    return 0;
}

是因为处理线程需要时间吗？

如果是这样，是否有一个近似估计在函数的哪个运行时并行化调用是有意义的？

【问题讨论】：

标签： c++ multithreading parallel-processing

【解决方案1】：

您正在测量所有对std::async的调用，并在获得最后一个线程的结束时间时等待所有结果。

我建议你一次只测量一个线程的时间，存储时间，然后单独报告。

大概是这样的：

using clock = std::chrono::high_resolution_clock;

constexpr size_t number_of_threads = 3;
std::vector<std::pair<clock::time_point, clock::time_point>> times(number_of_threads);

for (size_t t = 0; t < number_of_threads; ++t)
{
    auto start = clock::now();

    // Start the thread and wait for it to finish
    auto thread = std::async(std::launch::async, FUNCTION);
    (void) thread.get();

    auto end = clock::now();

    // Store the times
    times[t] = std::make_pair(start, end);        
}

// All threads are now finished, report the times
for (size_t t = 0; t < number_of_threads; ++t)
{
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(times[t].second - times[t].first);

    std::cout << "Duration thread #" << (t + 1) << ": " << duration.count() << " us\n";
}

【讨论】：

感谢您的评论/建议。我已经调整了更准确的时间测量（但是我使用了 stable_clock）。我得到以下输出（使用您的代码）： Duration thread #1: 298 ms Duration thread #2: 233 ms Duration thread #3: 141 ms 程序的总运行时间显示为 673 ms。但是，这很奇怪，因为三个线程应该并行执行？总运行时间不应该约为 300 毫秒吗？
@D.B.在程序的总运行时间中，您也有输出，相对较慢。另外，线程实际上并不是并行的，因为我的代码是串行执行它们的。您可以创建一个循环来启动线程，一个来结束它们（但是你会遇到同样的问题，第二个的结束时间也包括等待第一个，第三个的结束时间也包括第一个第二）。要可靠地正确测量并行代码非常困难。