是什么导致 DWARF perf 调用堆栈无法展开？答案

【问题标题】：What causes failure to unwind in a DWARF perf call stack?是什么导致 DWARF perf 调用堆栈无法展开？
【发布时间】：2021-06-18 14:15:03
【问题描述】：

在perf record --call-graph dwarf 生成并由perf script 打印的回溯中，我一直在为大约 5% 的调用堆栈获取错误的地址，即展开失败。一个例子是

my_bin 770395 705462.825887:    3560360 cycles: 
        7f0398b9b7e2 __vsnprintf_internal+0x12 (/usr/lib/x86_64-linux-gnu/libc-2.32.so)
        7ffcdb6fbfdf [unknown] ([stack])

my_bin 770395 705462.827040:    3447195 cycles: 
        7f0398ba1624 __GI__IO_default_xsputn+0x104 (inlined)
        7ffcdb6fb7af [unknown] ([stack])

它是由这段代码产生的（用g++ -O3 -g -fno-omit-frame-pointer my_bin.cpp -o my_bin编译）：

#include <cstdio>
#include <iostream>

int __attribute__ ((noinline)) libc_string(int x) {
    char buf[64] = {0};
    // Some nonsense workload using libc
    int res = 0;
    for (int i = 0; i < x; ++i) {
        res += snprintf(buf, 64, "%d %d %d Can I bring my friends to tea?", (i%10), (i%3)*10, i+2);
        res = res % 128;
    }
    return res;
}

int main() {
    int result = libc_string(20000000);
    std::cout << result << "\n";
}

我很确定我的程序在堆栈中不应该有可执行代码，所以这些地址似乎是错误的。这不仅是一个程序，而且我分析过的大多数程序都有大约 5% 的错误调用堆栈。这些调用栈大多只有两个栈帧，最里面的栈帧有时在 Eigen 之类的库中（尽管它们通常有正确的调用栈），有时在 C++ STL 或 libc 中。我在unknown、[stack]、[heap]、anon、//anon、libstdc++.so.6.0.28 或<my_bin> 中看到了结束。

我在 Ubuntu 18.04、20.04 和 20.10 上看到过这个。

这只发生在 DWARF 展开时。如何解决这个问题？

【问题讨论】：

您是否尝试过在其中一个函数中使用 gdb 并查看回溯的样子？
@JorgeBellon 是的，它看起来总是正确的。

标签： c++ callstack perf backtrace dwarf

【解决方案1】：

您还测试了哪些其他类型的展开？

在示例中，我 disabled kernel ASLR feature 和 setarch x86_64 -R 以获得更稳定的地址和更小的 perf.data 文件。

使用perf record 选项-e cycles:u 命令可能会有所帮助，因为它不包含内核样本。

对于使用__GI__IO_default_xsputn (inlined) 函数的默认性能记录事件（'-e 周期：u'；使用来自 libc6-prof package 的 libc-2.31）生成的文件，我重现了类似的 dwarf unwind 问题：

env LD_LIBRARY_PATH=/lib/libc6-prof/x86_64-linux-gnu setarch `uname -m` -R perf record --call-graph dwarf -o perf.data.dwarf.u -e cycles:u ./my_bin
perf script -i perf.data.dwarf.u |less

不正确的样本：

my_bin 28100 760107.271010:     461418 cycles:u: 
            7ffff7c74f06 __GI__IO_default_xsputn+0x106 (inlined)
            7ffff7c59c6c __vfprintf_internal+0xf4c (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)

正确示例：

my_bin 28100 760107.257283:     267268 cycles:u: 
            7ffff7c74eff __GI__IO_default_xsputn+0xff (inlined)
            7ffff7c59c6c __vfprintf_internal+0xf4c (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
            7ffff7c6e9f6 __vsnprintf_internal+0xb6 (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
            7ffff7d14a2c ___snprintf_chk+0x9c (inlined)
            555555555314 libc_string+0xb4 (/home/user/so/my_bin)
            555555555314 libc_string+0xb4 (/home/user/so/my_bin)
            555555555111 main+0x11 (/home/user/so/my_bin)
            7ffff7c040fa __libc_start_main+0x10a (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
            55555555519d _start+0x2d (/home/user/so/my_bin)

为了正确展开，我在__GI__IO_default_xsputn+ 中有许多不同的偏移量（+ 之后的数字）：

perf script -i perf.data.dwarf.u ||grep vsnprintf_internal -B3 |grep _GI__IO_default_xsputn|cut -d + -f 2|sort | uniq -c
...
    208 0x0 (inlined)
     45 0x101 (inlined)
      2 0x105 (inlined)
     91 0x10 (inlined)
    294 0x110 (inlined)
      2 0x117 (inlined)
      2 0x11d (inlined)
    326 0x121 (inlined)

但我没有正确展开的 +0x106 地址。并且所有不正确的展开都有 +0x106 地址。让我们用 gdb 检查一下（禁用 ASLR 后更容易；+262 为 +0x106）：

env LD_LIBRARY_PATH=/lib/libc6-prof/x86_64-linux-gnu setarch `uname -m` -R  gdb -q ./my_bin
(gdb) start
(gdb) x/i 0x7ffff7c74f06
   0x7ffff7c74f06 <__GI__IO_default_xsputn+262>:    retq  
(gdb) disassemble __GI__IO_default_xsputn
Dump of assembler code for function __GI__IO_default_xsputn:
...
   0x00007ffff7c74f05 <+261>:   pop    %rbp
   0x00007ffff7c74f06 <+262>:   retq

展开问题似乎与在retq 指令或pop %rbp 之后采样的内联（？）函数有关？

【讨论】：

要调试解析，您可以检查perf script -D 的输出，它将转储原始样本数据，包括默认8192 字节大小的寄存器和堆栈数据副本。正确和错误的展开样本可能会有一些差异。
谢谢，这对我有用，是朝着理解问题迈出的一大步。理想情况下，我不需要排除内核样本，而且我注意到奇怪的是，cycles:k 也产生了良好的调用堆栈（在组合的cycles 中不可见）。所以对我来说神秘的是为什么cycles != cycles:u + cycles:k。
cycles:k 应该没有用户空间堆栈，据我所知。即使使用cycles:u，我现在的调用堆栈解析也不正确，它们看起来像cycles:u: / __GI__IO_default_xsputn+0x106 (inlined) / __vfprintf_internal(libc6-prof) / NOTHING，所以问题出在其他地方。还有更容易重新编译的 perf 源代码（make perf-src-tar-gz 从 linux 内核源代码目录或从 mirrors.edge.kernel.org/pub/linux/kernel/tools/perf 下载）和调试堆栈副本的 dwarf 解析。
@nnnmmm，我更新了答案，但我仍然有展开失败。您能否与 gdb 核对一下，IP 样本中是否总是有retq 指令用于矮人展开失败或pop %rbp 就在采样 IP 之前？重新编译 libc6 并保存相关目标文件的临时 asm 文件 (-save-temps) 可能有助于找到涉及 %rbp 的指令的侏儒问题。
我目前的状态是，只有当我没有指定事件或指定cycles:pp 时，我才会得到不正确的样本。使用cycles 或cycles:u 我得到了干净的堆栈，而使用cycles:k 则完全不同。所以我不确定我们是否在这里看到了相同的东西，因为你对:u 有问题，而我没有。