为什么这个生成的汇编代码似乎包含废话？ [复制]答案

【问题标题】：Why does this generated assembly code seem to contain nonsense? [duplicate]为什么这个生成的汇编代码似乎包含废话？ [复制]
【发布时间】：2019-12-28 02:58:12
【问题描述】：

我使用https://godbolt.org/ 和“x86-64 gcc 9.1”来汇编以下 C 代码，以了解为什么将指向局部变量的指针作为函数参数传递是有效的。现在我很难理解一些步骤。

我评论了我遇到困难的台词。

void printStr(char* cpStr) {
    printf("str: %s", cpStr);
}


int main(void) {
    char str[] = "abc";
    printStr(str);
    return 0;
}

.LC0:
        .string "str: %s"
printStr:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16                     ; why allocate 16 bytes when using it just for the pointer to str[0] which is 4 bytes long?
        mov     QWORD PTR [rbp-8], rdi      ; why copy rdi to the stack...
        mov     rax, QWORD PTR [rbp-8]      ; ... just to copy it into rax again? Also rax seems to already contain the pointer to str[0] (see *)
        mov     rsi, rax
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        nop
        leave
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16                     ; why allocate 16 bytes when "abc" is just 4 bytes long?
        mov     DWORD PTR [rbp-4], 6513249
        lea     rax, [rbp-4]                ; pointer to str[0] copied into rax (*)
        mov     rdi, rax                    ; why copy the pointer to str[0] to rdi?
        call    printStr
        mov     eax, 0
        leave
        ret

【问题讨论】：

16 个字节用于对齐。您正在查看未优化的代码，看到废话不要感到惊讶。将-O3 添加到编译器选项。 rdi 用于按照standard calling convention 传递第一个参数。
@Jester 谢谢。这就是我的答案。
一般来说，未优化代码的编译方式使得它在调试器中单步执行时表现良好
可能是 Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? 的副本，这解释了为什么 -O0 会这样做。

标签： c assembly x86-64

【解决方案1】：

感谢 Jester 的帮助，我可以解决我的困惑。以下代码是使用 GCC 的“-O1”标志编译的（对我来说是理解发生了什么的最佳优化级别）：

.LC0:
    .string "str: %s"
printStr:
    sub     rsp, 8
                                            ; now the call to printf gets prepared, rdi = first argument, rsi = second argument
    mov     rsi, rdi                        ; move str[0] to rsi
    mov     edi, OFFSET FLAT:.LC0           ; move address of static string literal "str: %s" to edi
    mov     eax, 0                          ; set eax to the number of vector registers used, because printf is a varargs function
    call    printf
    add     rsp, 8
    ret
main:
    sub     rsp, 24
    mov     DWORD PTR [rsp+12], 6513249     ; create string "abc" on the stack
    lea     rdi, [rsp+12]                   ; move address of str[0] (pointer to 'a') to rdi (first argument for printStr)
    call    printStr
    mov     eax, 0
    add     rsp, 24
    ret

正如 Jester 所说，分配了 16 个字节用于对齐。 Stack Overflow 上有一篇很好的帖子解释了这个here。

编辑：

Stack Overflow 上有一篇文章解释了为什么在调用可变参数函数 here 之前将 al 归零。

【讨论】：

"将 eax 设置为 0，因为 printStr 的类型为 void" - 错误。 eax（实际上是al）包含用于将参数传递给可变参数函数的向量寄存器的数量，在本例中为printf。它与调用者printStr 无关，也与返回类型为void 无关。
@Jester 再次感谢您 - 将编辑我的答案
PS：如果您想看到实际效果，请打印一个浮点值。