printf 上的分段错误 - NASM 64bit Linux答案

【问题标题】：Segmentation fault on printf - NASM 64bit Linuxprintf 上的分段错误 - NASM 64bit Linux
【发布时间】：2014-09-05 20:53:37
【问题描述】：

我尝试使用scanf 输入四个浮点数，将它们存储到堆栈中，然后使用vmovupd 将它们复制到寄存器以供使用。我的问题是当我尝试输出这 4 个数字时，程序段在printf 处出错。

我认为它与堆栈有关，但我尝试多次弹出（一次多条指令）无济于事。我还是汇编编码的新手，所以使用gdb 对我来说有点太高级了。

您会注意到我包含了一个名为debug 的文件。它允许我查看寄存器和堆栈（这就是为什么有 dumpstack 指令的原因。）这是由我的教授提供的，它确实帮助了一些但显然还不够（或者我可能只是遗漏了一些东西）。

这是.cpp：

#include <iostream>

using namespace std;

extern "C" double ComputeElectricity();

int main()
{
    cout << "Welcome to electric circuit processing by Chris Tarazi." << endl;
    double returnValue = ComputeElectricity();
    cout << "The driver received this number: " << returnValue << endl; 
    return 0;
}

这是ASM 代码：

%include "debug.inc"
extern printf
extern scanf
global ComputeElectricity

;---------------------------------Declare variables-------------------------------------------

segment .data

greet db "This progam will help you analyze direct current circuits configured in parallel.", 10, 0
voltage db "Please enter the voltage of the entire circuit in volts: ", 0
first db "Enter the power consumption of device 1 (watts): ", 0
second db "Enter the power consumption of device 2 (watts): ", 0
third db "Enter the power consumption of device 3 (watts): ", 0
fourth db "Enter the power consumption of device 4 (watts): ", 0
thankyou db "Thank you. The computations have completed with the following results.", 10, 0
circuitV db "Curcuit total voltage: %1.18lf v", 10, 0
deviceNum db "Device number:                1                    2                    3                    4", 10, 0
power db "Power (watts): %1.18lf %1.18lf %1.18lf %1.18lf", 10, 0
current db "Current (amps): %1.18lf %1.18lf %1.18lf %1.18lf", 10, 0
totalCurrent db "Total current in the circuit is %1.18lf amps.", 10, 0
totalPower db "Total power in the circuit is %1.18lf watts.", 10, 0

bye db "The analyzer program will now return total power to the driver.", 10, 0

string db "%s", 0
floatfmt db "%lf", 0
fourfloat db "%1.18lf %1.18lf %1.18lf %1.18lf", 0

;---------------------------------Begin segment of executable code------------------------------

segment .text

dumpstack 20, 10, 10

ComputeElectricity:

;dumpstack 30, 10, 10

;---------------------------------Output greet message------------------------------------------

    mov qword rax, 0
    mov rdi, string 
    mov rsi, greet
    call printf

;---------------------------------Prompt for voltage--------------------------------------------

    mov qword rax, 0
    mov rdi, string
    mov rsi, voltage
    call printf

;---------------------------------Get  voltage--------------------------------------------------

    push qword 0
    mov qword rax, 0
    mov rdi, floatfmt
    mov rsi, rsp
    call scanf
    vbroadcastsd ymm15, [rsp]
    pop rax

;---------------------------------Prompt for watts 1--------------------------------------------

    mov qword rax, 0
    mov rdi, string
    mov rsi, first
    call printf

;---------------------------------Get watts 1---------------------------------------------------

    push qword 0
    mov qword rax, 0
    mov rdi, floatfmt
    mov rsi, rsp
    call scanf

;---------------------------------Prompt for watts 2--------------------------------------------

    mov qword rax, 0
    mov rdi, string
    mov rsi, second         
    call printf 

;---------------------------------Get watts 2---------------------------------------------------

    push qword 0
    mov qword rax, 0
    mov rdi, floatfmt
    mov rsi, rsp
    call scanf

;---------------------------------Prompt for watts 3--------------------------------------------

    mov qword rax, 0
    mov rdi, string
    mov rsi, third      
    call printf 

;---------------------------------Get watts 3---------------------------------------------------

    push qword 0
    mov qword rax, 0
    mov rdi, floatfmt
    mov rsi, rsp
    call scanf

;---------------------------------Prompt for watts 4--------------------------------------------

    mov qword rax, 0
    mov rdi, string
    mov rsi, fourth 
    call printf 

;---------------------------------Get watts 4---------------------------------------------------

    push qword 0
    mov qword rax, 0
    mov rdi, floatfmt
    mov rsi, rsp
    call scanf

    ;dumpstack 50, 10, 10

;---------------------------------Move data into correct registers------------------------------

    vmovupd ymm14, [rsp]                ; move all 4 numbers from the stack to ymm14

    pop rax
    pop rax
    pop rax
    pop rax

    ;dumpstack 55, 10, 10       

    vextractf128 xmm10, ymm14, 0        ; get lower half
    vextractf128 xmm11, ymm14, 1        ; get upper half

;---------------------------------Move data into low xmm registers------------------------------

    movsd xmm1, xmm11                   ; move ymm[128-191] (3rd value) into xmm1
    movhlps xmm0, xmm11                 ; move from highest value from xmm11 to xmm0

    movsd xmm3, xmm10
    movhlps xmm2, xmm10

    ;showymmregisters 999

;---------------------------------Output results-------------------------------------------------

    ;dumpstack 60, 10, 10

    mov rax, 4
    mov rdi, fourfloat
    push qword 0
    call printf
    pop rax

ret

【问题讨论】：

为什么不简单地编写一个调用printf 的“C”程序，生成汇编列表，然后从该列表中学习编译器是如何做到的？
@PaulMcKenzie 我生成了this，如果你能看到它并指出我正确的方向。我还是不明白为什么会这样
我很困惑为什么人们一直建议研究编译器生成的代码。逆向工程几乎是最困难最复杂的任务，你可以很快得出错误的结论，尤其是从编译器生成的代码中。相反，请阅读相关文档（ABI 和指令集）并使用调试器。
仅供参考，shuffle 通常比加载更昂贵，这是将 4 个 double 加载到 xmm0..3 的一种非常低效的方法。也许 vmovaps xmm0, [rsp] / vmovaps xmm2, [rsp+16] 然后 movhlps 或 vpunpckhpd 那些进入 xmm1 和 xmm3 的高半部分将是负载和随机播放的良好组合。（在将 double 传递给函数时，您可以在 XMM 寄存器的高半部分中留下“垃圾”。）除非您真的想要与旧的高半部分合并，否则您永远不会想要 movsd xmm,xmm .

标签： linux assembly x86-64 nasm calling-convention

【解决方案1】：

问题在于您的堆栈使用情况。

首先，ABI 文档要求 rsp 在call 之前对齐 16 字节。

由于call 会将一个 8 字节的返回地址压入堆栈，因此您需要将 rsp 调整为 16 的倍数加 8 以恢复到 16 字节对齐。 16 * n + 8包括任何 push 指令或对 RSP 的其他更改，而不仅仅是 sub rsp, 24。这是段错误的直接原因，因为printf 将使用对齐的SSE 指令，这将对未对齐的地址产生错误。

如果你解决了这个问题，你的堆栈仍然不平衡，因为你一直在推送值，但从不弹出它们。很难理解你想用堆栈做什么。

通常的方法是在函数的开头（序言）为本地人分配空间，并在结束时释放它（尾声）。如上所述，这个数量（包括任何推送）应该是 16 加 8 的倍数，因为函数 entry 上的 RSP（在调用者的 call 之后）距离 16 字节边界有 8 个字节。

在大多数 glibc 构建中，printf 只关心 AL != 0 时的 16 字节堆栈对齐。（因为这意味着有 FP 参数，所以它将所有 XMM 寄存器转储到堆栈中，以便它可以索引它们用于%f 转换。）

如果你用未对齐的堆栈调用它仍然是一个错误，即使它恰好在你的系统上工作；未来的 glibc 版本可能包含依赖于 16 字节堆栈对齐的代码，即使没有 FP args。例如，scanf 已经在未对齐的堆栈上崩溃，即使在大多数 GNU/Linux 发行版上使用AL=0。

【讨论】：

我应该在序言中做align 16 然后将rsp 调整为16 加8 的倍数？我了解对齐部分，但不了解调整。你能详细说明一下吗？此外，我对堆栈所做的是将 4 个值放入堆栈并让 vmovupd 拾取这 4 个值并将它们放入 ymm14。从那里我不知道该怎么做。
align 用于在汇编/链接时对齐代码和数据。我的意思是保持 rsp 对齐，您只需将其调整为 16 加 8 的倍数即可实现。您已将内容放入堆栈，但没有删除它们。
没关系。看起来我教授的代码导致寄存器出现问题。然而，在将 4 个值移出堆栈后，我确实弹出了 4 次。然后在printf 之前添加push qword 0 然后在printf 之后添加pop rax。