测量线程的上下文切换时间答案

【问题标题】：Measuring context switch time for threads测量线程的上下文切换时间
【发布时间】：2016-07-04 17:58:39
【问题描述】：

我想计算上下文切换时间，我正在考虑使用互斥锁和条件变量在 2 个线程之间发出信号，以便一次只运行一个线程。我可以使用CLOCK_MONOTONIC 测量整个执行时间，使用CLOCK_THREAD_CPUTIME_ID 测量每个线程运行的时间。
那么上下文切换时间就是(total_time - thread_1_time - thread_2_time)。为了得到更准确的结果，我可以遍历它并取平均值。

这是近似上下文切换时间的正确方法吗？我想不出任何可能出错的地方，但我得到的答案不到 1 纳秒..

我忘了提到，我循环的时间越多，取平均值，我得到的结果就越小。

编辑

这是我拥有的代码的 sn-p

    typedef struct
    {
      struct timespec start;
      struct timespec end;
    }thread_time;

    ...


    // each thread function looks similar like this
    void* thread_1_func(void* time)
    {
       thread_time* thread_time = (thread_time*) time;

       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start)); 
       for(x = 0; x < loop; ++x)
       {
         //where it switches to another thread
       }
       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));

       return NULL;
   };

   void* thread_2_func(void* time)
   {
      //similar as above
   }

   int main()
   {
      ...
      pthread_t thread_1;
      pthread_t thread_2;

      thread_time thread_1_time;
      thread_time thread_2_time;

      struct timespec start, end;

      // stamps the start time 
      clock_gettime(CLOCK_MONOTONIC, &start);

      // create two threads with the time structs as the arguments 
      pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
      pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time); 
      // waits for the two threads to terminate 
      pthread_join(thread_1, NULL);
      pthread_join(thread_2, NULL);

      // stamps the end time 
      clock_gettime(CLOCK_MONOTONIC, &end);

      // then I calculate the difference between between total execution time and the total execution time of two different threads..
   }

【问题讨论】：

您是否确保它们运行在同一个处理器内核上，也许是sched_set_affinity

标签： c multithreading time

【解决方案1】：

首先，使用CLOCK_THREAD_CPUTIME_ID 可能是非常错误的；这个时钟将给出在那个线程中花费的时间，在用户模式。但是上下文切换不会在用户模式下发生，您需要使用另一个时钟。此外，在多处理系统上，时钟可以在处理器之间提供不同的值！因此，我建议您改用CLOCK_REALTIME 或CLOCK_MONOTONIC。但是请注意，即使您快速连续读取其中任何一个，时间戳通常也会相隔数十纳秒。

至于上下文切换 - 上下文切换有很多种。最快的方法是完全在软件中从一个线程切换到另一个线程。这只是意味着您将旧寄存器压入堆栈，设置任务切换标志以便延迟保存 SSE/FP 寄存器，保存堆栈指针，加载新堆栈指针并从该函数返回 - 因为其他线程也做了同样的事情，该函数的返回发生在另一个线程中。

这个线程到线程的切换非常快，它的开销与任何系统调用大致相同。从一个进程切换到另一个进程要慢得多：这是因为必须通过设置 CR0 寄存器来刷新和切换用户空间页表；这会导致 TLB 中的未命中，它将虚拟地址映射到物理地址。

但是

#include <sched.h>

cpu_set_t  mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);

那么您应该非常确定您测量的时间来自真实的上下文切换。此外，要测量切换浮点/SSE 堆栈的时间（这种情况会延迟发生），您应该有一些浮点变量并在上下文切换之前对它们进行计算，然后将 .1 添加到一些易失性浮点变量 在上下文切换之后看它是否对切换时间有影响。

【讨论】：

'
我仍然不明白为什么CLOCK_THREAD_CPUTIME_ID 不起作用。如果我在线程开始时调用clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start)，在线程结束时调用clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end)，那么差异不应该是总时差吗？
我已经尝试将 cpu 设置为 0，但我得到的结果仍然非常小。我已经包含了我的代码的 sn-p，我想知道是否有什么问题.....

【解决方案2】：

这不是直截了当的，但像往常一样，有人已经为此做了很多工作。（我没有在此处包括源代码，因为我看不到任何提及的许可证）

https://github.com/tsuna/contextswitch/blob/master/timetctxsw.c

如果你将该文件作为 (context_switch_time.c) 复制到 linux 机器，你可以使用它来编译和运行它

gcc -D_GNU_SOURCE -Wall -O3 -std=c11 -lpthread context_switch_time.c
./a.out

我在小型虚拟机上得到以下结果

2000000  thread context switches in 2178645536ns (1089.3ns/ctxsw)

这个问题之前出现过...对于 Linux，您可以在此处找到一些材料。

Write a C program to measure time spent in context switch in Linux OS

注意，当用户在上面的链接中运行测试时，他们也在用游戏锤击机器并进行编译，这就是上下文切换需要很长时间的原因。更多信息在这里...

how can you measure the time spent in a context switch under java platform

【讨论】：

第一个链接有 23 票接受的答案。它建议时间大约为 10-20 毫秒。这只是高得离谱，并且是似乎感染多线程标签的典型垃圾：（