了解性能报告答案

【问题标题】：Understanding the perf report了解性能报告
【发布时间】：2018-08-29 00:39:05
【问题描述】：

我一直在从事一些对时间敏感的项目。由于时间安排上出现了一些不受欢迎的尖峰，我不得不走得更深一些。

场景：

我有一个内核模块，它固定在 CPU 内核上。这个 CPU 内核也在内核启动参数中的 isolcpus 中列出。这是我在 cmdline 中对内核启动参数所做的操作

intel_iommu=on iommu=pt default_hugepagesz=1G hugepagesz=1G hugepages=1 intel_idle.max_cstate=0 processor.max_cstate=0 nohz_full=7-11 isolcpus=7-11 mce=off rcu_nocbs=7-11 nosoftlockup idle=poll cpuidle.off=1 powersave=off nonmi_ipi nowatchdog

我运行了以下命令（此时我正在尝试仅分析 CPU 8）

sudo ./perf record -e context-switches -a -g --cpu=8 taskset -c 9 ./test.sh

**编辑 1 - 附加信息 **

内核版本：4.15.12

我的内核模块每 X 个时间单位发送一次同步数据包。目前，我已将其配置为每 50 毫秒发送一次。

在这种情况下，我简化了 test.sh。它需要几个参数，但是，这个脚本的重要一点是它调用了内核模块。

例如，我的 KM 有一个 proc fs。当此 proc fs 上触发写入事件时，它会创建一个新的 Kthread，将其绑定到 CPU (8)，并开始每 50ms 生成一次数据包。

为了避免冲突和上下文切换，我把这个东西移到了内核空间。此外，我已将脚本的亲和性设置为与内核模块不同的 CPU。

因此，我观察到的是，发送时间有一点抖动，可能是因为这些上下文切换。

这是我输入perf report后的输出

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8  of event 'context-switches'
# Event count (approx.): 39
#
# Children      Self  Command      Shared Object     Symbol
# ........  ........  ...........  ................  .................
#
    69.23%    69.23%  :-1          [kernel.vmlinux]  [k] do_task_dead
            |
            ---do_task_dead

    25.64%    25.64%  swapper      [kernel.vmlinux]  [k] schedule_idle
            |
            ---schedule_idle

     2.56%     2.56%  :2100        [kernel.vmlinux]  [k] _cond_resched
            |
            ---_cond_resched

     2.56%     2.56%  kworker/8:1  [kernel.vmlinux]  [k] schedule
            |
            ---schedule

它说有 8 次上下文切换。另外，我无法理解 :-1 在第一行的 Command 列中的实际含义是什么 do_task_dead() 。如果有人能给我一些深入研究这个问题的方向，那就太好了。

编辑 2 - perf 脚本报告和 cpu_idle 分析结果

swapper     0 [008] 64409.434193:          1 context-switches:
                  aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)

:-1    -1 [008] 64410.434267:          1 context-switches:
                  2ac066 do_task_dead (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 64410.442240:          1 context-switches:
                  aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)

:29026 29026 [008] 64411.442313:          1 context-switches:
                  acee0d _cond_resched (/lib/modules/4.15.12/build/vmlinux)

kworker/8:1   181 [008] 64411.442318:          1 context-switches:
                  acebf2 schedule (/lib/modules/4.15.12/build/vmlinux)

:-1    -1 [008] 64411.442327:          1 context-switches:
                  2ac066 do_task_dead (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 64411.466238:          8 context-switches:
                  aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 64414.538207:         31 context-switches:
                  aceea8 schedule_idle (/lib/modules/4.15.12/build/vmlinux)

使用 power:cpu_idle 事件运行，这里是 perf 脚本的输出

swapper     0 [008] 65787.514565: power:cpu_idle: state=4294967295 cpu_id=8
                  ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65788.514653: power:cpu_idle: state=0 cpu_id=8
                  ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65788.522618: power:cpu_idle: state=4294967295 cpu_id=8
                  ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65789.522693: power:cpu_idle: state=0 cpu_id=8
                  ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65789.546577: power:cpu_idle: state=4294967295 cpu_id=8
                  ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65790.546648: power:cpu_idle: state=0 cpu_id=8
                  ad39d0 cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)

swapper     0 [008] 65790.570574: power:cpu_idle: state=4294967295 cpu_id=8
                  ad3a2f cpu_idle_poll (/lib/modules/4.15.12/build/vmlinux)
....

和perf report 显示

# Samples: 22  of event 'power:cpu_idle'
# Event count (approx.): 22
#
# Children      Self  Trace output
# ........  ........  .........................
#
    50.00%    50.00%  state=0 cpu_id=8
            |
            ---cpu_idle_poll

    50.00%    50.00%  state=4294967295 cpu_id=8
            |
            ---cpu_idle_poll

谢谢，

酷酷的。

【问题讨论】：

一些想法：1) 使用perf script 查看实际事件及其顺序 2) 使用-e power:cpu_idle（需要调试文件/跟踪点），也可以查看 cpuidle 状态更改。关于:-1 - 我相信这意味着tid 被记录为-1，但我不知道为什么会这样。关于您的问题：尚不清楚您认为自己有什么问题以及什么 - 为什么您的期望是。 test.sh 是做什么的？你的内核模块是做什么的？你的内核版本是多少？
嗨！感谢您的意见。我现在要试试这些东西。关于更多信息，我已经更新了我原来的帖子。

标签： linux-kernel scheduling perf

【解决方案1】：

祖蓝很亲近，他的所有建议都应该被采纳。

来自perf report的手册页：

perf report 输出中的 command 列指的是从中收集样本的过程。在每线程/每进程模式下，这始终是被监视命令的名称。但在 cpu-wide 模式下，命令可能会有所不同。

因为您正在测量 per-cpu 上下文切换事件，从您的 perf record ... -cpu=8... 命令可以看出，perf report 命令将开始报告下一个/上一个任务的 PID/TID。你可以在这里看到 -

per-cpu context switch records pid/tid

现在值 -1 指的是一个已死的进程，即该进程已经超出了僵尸进程的状态。这意味着task_struct 成员现在指向已释放的内存，不应允许取消引用。do_task_dead 方法应清楚地反映这一点。 This 是进程的 pid 作为 -1 返回并同时在 perf report 中报告的位置。

关于这个问题有一个广泛的discussion。最初，值 0 用于在 perf report 输出中引用此类进程状态，但您可能已经猜到，pid=0 引用 空闲线程，因此使用值 -1 .

【讨论】：

嗨，非常感谢您澄清这一点。现在这似乎更清楚了；至少在代码部分。但是，我不确定这件事的来源。将进一步检查。按照祖蓝的建议，我确实为那些事件运行了 perf，我无法理解它。我已经粘贴了问题中的转储。可能是因为内核引导参数idle=poll，我看到很多cpu_idle_poll 和schedule_idle 的调用。但是，非常感谢您的意见。