【问题标题】：How to write x64 machine code into virtual memory and execute it for Windows in C++如何将 x64 机器代码写入虚拟内存并在 C++ 中为 Windows 执行
【发布时间】：2017-06-11 10:28:27
【问题描述】：

我一直想知道 V8 JavaScript 引擎和任何其他 JIT 编译器如何执行生成的代码。

这是我在尝试编写一个小演示时阅读的文章。

我对汇编知之甚少，所以一开始用http://gcc.godbolt.org/写了一个函数，得到反汇编后的输出，但是代码在windows上不行。

然后我写了一个小的 C++ 代码，用-g -Og 编译，然后用 gdb 得到分解输出。

#include <stdio.h>

int square(int num) {
    return num * num;
}

int main() {
    printf("%d\n", square(10));
    return 0;
}

输出：

Dump of assembler code for function square(int):
=> 0x00000000004015b0 <+0>:     imul   %ecx,%ecx
   0x00000000004015b3 <+3>:     mov    %ecx,%eax
   0x00000000004015b5 <+5>:     retq

我将输出（'%' 已删除）复制粘贴到 online x86 assembler 并得到 { 0x0F, 0xAF, 0xC9, 0x89, 0xC1, 0xC3 }。

这是我的最终代码。如果我用 gcc 编译它，我总是得到 1。如果我用 VC++ 编译它，我得到随机数。怎么回事？

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>

typedef unsigned char byte;
typedef int (*int0_int)(int);

const byte square_code[] = {
    0x0f, 0xaf, 0xc9,
    0x89, 0xc1,
    0xc3
};

int main() {
    byte* buf = reinterpret_cast<byte*>(VirtualAlloc(0, 1 << 8, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
    if (buf == nullptr) return 0;
    memcpy(buf, square_code, sizeof(square_code));
    {
        DWORD old;
        VirtualProtect(buf, 1 << 8, PAGE_EXECUTE_READ, &old);
    }
    int0_int square = reinterpret_cast<int0_int>(buf);
    int ans = square(100);
    printf("%d\n", ans);
    VirtualFree(buf, 0, MEM_RELEASE);
    return 0;
}

注意

我正在尝试学习 JIT 的工作原理，所以请不要建议我使用 LLVM 或任何库。我保证我会在实际项目中使用合适的 JIT 库，而不是从头开始编写。

【问题讨论】：

好点，按照建议编辑了标题。我确实希望我的问题对其他读者有所帮助，因为我能找到的大多数在线 JIT 文章都是针对 POSIX 的。
请注意，这不是“进入堆内存”，您是在任何堆外分配一个页面（这是最好的，这样您的 VirtualProtect 调用不会影响任何其他对象）
显示为调用函数指针的int ans = square(100); 生成的程序集。

标签： c++ windows 64-bit jit

【解决方案1】：

注意：正如 Ben Voigt 在 cmets 中指出的那样，这实际上只对 x86 有效，而不是 x86_64。对于 x86_64，您的程序集中只有一些错误（在 x86 中仍然是错误），正如 Ben Voigt 在他的回答中指出的那样。

发生这种情况是因为您的编译器在生成程序集时可以看到函数调用的双方。由于编译器可以控制为调用者和被调用者生成代码，因此它不必遵循 cdecl 调用约定，它也没有。

MSVC 的默认调用约定是 cdecl。基本上，函数参数以与它们列出的顺序相反的顺序被推入堆栈，因此对foo(10, 100) 的调用可能会导致程序集：

push 100
push 10
call foo(int, int)

在您的情况下，编译器将在调用站点生成如下内容：

push 100
call esi ; assuming the address of your code is in the register esi

这不是您的代码所期望的。您的代码期望其参数在寄存器ecx 中传递，而不是在堆栈中。

编译器使用了类似于 fastcall 调用约定的内容。如果我编译一个类似的程序（我得到的程序集略有不同），我会得到预期的结果：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>

typedef unsigned char byte;
typedef int (_fastcall *int0_int)(int);

const byte square_code[] = {
    0x8b, 0xc1,
    0x0f, 0xaf, 0xc0,
    0xc3
};

int main() {
    byte* buf = reinterpret_cast<byte*>(VirtualAlloc(0, 1 << 8, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE));
    if (buf == nullptr) return 0;
    memcpy(buf, square_code, sizeof(square_code));
    {
        DWORD old;
        VirtualProtect(buf, 1 << 8, PAGE_EXECUTE_READ, &old);
    }
    int0_int square = reinterpret_cast<int0_int>(buf);
    int ans = square(100);
    printf("%d\n", ans);
    VirtualFree(buf, 0, MEM_RELEASE);
    return 0;
}

请注意，我已经告诉编译器使用_fastcall 调用约定。如果你想使用cdecl，程序集需要看起来更像这样：

push ebp
mov  ebp, esp
mov  eax, DWORD PTR _n$[ebp]
imul eax, eax
pop  ebp
ret  0

（免责声明：我不擅长组装，这是由 Visual Studio 生成的）

【讨论】：

x86_64 的默认调用约定确实使用寄存器进行参数传递。我认为您错误地使用 x86 进行了分析。

【解决方案2】：

我复制粘贴了输出（'%' 已删除）

嗯，这意味着你的第二条指令是

mov ecx, eax

这根本没有意义（它用未初始化的返回值覆盖乘法的结果）。

另一方面

mov eax, foo
ret

是一种非常常见的模式，用于以非void 返回类型结束函数。

您的两种汇编语言（AT&T 风格与英特尔风格）之间的区别不仅仅是% 标记、the operand order is reversed，而且指针和偏移量的表示方式也非常不同。

您需要在 gdb 中发出 set disassembly-flavor intel 命令

【讨论】：