用像 OCaml 这样的函数式语言实现一个直接线程的解释器答案

【问题标题】：Implementing a direct-threaded interpreter in a functional language like OCaml用像 OCaml 这样的函数式语言实现一个直接线程的解释器
【发布时间】：2010-08-31 00:22:59
【问题描述】：

在 C/C++ 中，您可以使用函数指针数组实现直接线程解释器。该数组代表您的程序 - 一个操作数组。每个操作函数都必须以调用数组中的下一个函数结束，例如：

void op_plus(size_t pc, uint8_t* data) {
  *data += 1;
  BytecodeArray[pc+1](pc+1, data); //call the next operation in the array
}

BytecodeArray 是一个函数指针数组。如果我们有一个包含这些 op_plus 操作的数组，那么数组的长度将决定我们增加数据内容的频率。（当然，您需要添加某种终止操作作为数组中的最后一个操作）。

如何在 OCaml 中实现这样的功能？我可能试图按字面意思翻译这段代码：我使用的是 C++ 中的 OCaml 函数数组。这样做的问题是我总是以类似的方式结束：

let op_plus pc data = Printf.printf "pc: %d, data_i: %d \n" pc data;
                        let f = (op_array.(pc+1)) in         
                        f (pc+1) (data+1) ;;

其中 op_array 是在上述作用域中定义的 Array，然后稍后重新定义它以填充一堆 op_plus 函数......但是，op_plus 函数使用了之前定义的 op_array。这是一个先有鸡还是先有蛋的问题。

【问题讨论】：

如果您以这种方式实现直接线程解释器，您很快就会遇到堆栈溢出 :-) 在标准 C 中无法实现直接线程解释器，这就是 GNU 发明计算标签 gotos 的原因作为编译器扩展。
@Lothar "堆栈溢出" -> 不在 OCaml 版本中。问题中对f 的调用被编译为尾调用。我几乎要评论它，然后我决定这不是问题的主题。

标签： functional-programming ocaml interpreter

【解决方案1】：

另一种选择是使用 CPS 并完全避免显式函数数组。在这种情况下，尾调用优化仍然适用。

我不知道你是如何生成代码的，但我们不要不合理地假设在某个时候你有一组 VM 指令要为执行做准备。每条指令仍然表示为一个函数，但它接收的是延续函数而不是程序计数器。

这是最简单的例子：

type opcode = Add of int | Sub of int

let make_instr opcode cont =
    match opcode with
    | Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
    | Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)

let compile opcodes =
    Array.fold_right make_instr opcodes (fun x -> x)

用法（查看推断类型）：

# #use "cpsvm.ml";;
type opcode = Add of int | Sub of int
val make_instr : opcode -> (int -> 'a) -> int -> 'a = <fun>
val compile : opcode array -> int -> int = <fun>
# let code = [| Add 13; Add 42; Sub 7 |];;
val code : opcode array = [|Add 13; Add 42; Sub 7|]
# let fn = compile code;;
val fn : int -> int = <fun>
# fn 0;;
add 0 13
add 13 42
sub 55 7
- : int = 48

更新：

在此模型中很容易引入 [条件] 分支。 if 延续由两个参数构成：iftrue-continuation 和 iffalse-continuation，但与所有其他延续函数具有相同的类型。问题是我们不知道在向后分支的情况下是什么构成了这些延续（向后，因为我们从尾部编译到头部）。这很容易通过破坏性更新来克服（尽管如果您使用高级语言进行编译，可能会有更优雅的解决方案）：只需留下“漏洞”并稍后在编译器达到分支目标时填充它们。

示例实现（我使用了字符串标签而不是整数指令指针，但这并不重要）：

type label = string

type opcode =
      Add of int | Sub of int
    | Label of label | Jmp of label | Phi of (int -> bool) * label * label

let make_instr labels opcode cont =
    match opcode with
    | Add x -> fun data -> Printf.printf "add %d %d\n" data x; cont (data + x)
    | Sub x -> fun data -> Printf.printf "sub %d %d\n" data x; cont (data - x)
    | Label label -> (Hashtbl.find labels label) := cont; cont
    | Jmp label ->
        let target = Hashtbl.find labels label in
        (fun data -> Printf.printf "jmp %s\n" label; !target data)
    | Phi (cond, tlabel, flabel) ->
        let tcont = Hashtbl.find labels tlabel
        and fcont = Hashtbl.find labels flabel in
        (fun data ->
            let b = cond data in
            Printf.printf "branch on %d to %s\n"
                data (if b then tlabel else flabel);
            (if b then !tcont else !fcont) data)

let compile opcodes =
    let id = fun x -> x in
    let labels = Hashtbl.create 17 in
    Array.iter (function
        | Label label -> Hashtbl.add labels label (ref id)
        | _ -> ())
        opcodes;
    Array.fold_right (make_instr labels) opcodes id

为了清楚起见，我使用了两次传递，但很容易看出可以一次完成。

这是一个简单的循环，可以通过上面的代码编译执行：

let code = [|
    Label "entry";
    Phi (((<) 0), "body", "exit");
    Label "body";
    Sub 1;
    Jmp "entry";
    Label "exit" |]

执行跟踪：

# let fn = compile code;;
val fn : int -> int = <fun>
# fn 3;;
branch on 3 to body
sub 3 1
jmp entry
branch on 2 to body
sub 2 1
jmp entry
branch on 1 to body
sub 1 1
jmp entry
branch on 0 to exit
- : int = 0

更新 2：

在性能方面，CPS 表示可能比基于数组的表示更快，因为在线性执行的情况下没有间接性。延续函数直接存储在指令闭包中。在基于数组的实现中，它必须首先增加程序计数器并执行数组访问（具有额外的边界检查开销）。

我已经制定了一些基准来证明这一点。下面是一个基于数组的解释器的实现：

type opcode =
      Add of int | Sub of int
    | Jmp of int | Phi of (int -> bool) * int * int
    | Ret

let compile opcodes =
    let instr_array = Array.make (Array.length opcodes) (fun _ data -> data)
    in Array.iteri (fun i opcode ->
        instr_array.(i) <- match opcode with
        | Add x -> (fun pc data ->
            let cont = instr_array.(pc + 1) in cont (pc + 1) (data + x))
        | Sub x -> (fun pc data ->
            let cont = instr_array.(pc + 1) in cont (pc + 1) (data - x))
        | Jmp pc -> (fun _ data ->
            let cont = instr_array.(pc) in cont (pc + 1) data)
        | Phi (cond, tbranch, fbranch) ->
            (fun _ data ->
                let pc = (if cond data then tbranch else fbranch) in
                let cont = instr_array.(pc) in
                cont pc data)
        | Ret -> fun _ data -> data)
        opcodes;
    instr_array

let code = [|
    Phi (((<) 0), 1, 3);
    Sub 1;
    Jmp 0;
    Ret
    |]

let () =
    let fn = compile code in
    let result = fn.(0) 0 500_000_000 in
    Printf.printf "%d\n" result

让我们看看它与上面基于 CPS 的解释器相比如何（当然，所有调试跟踪都被剥离了）。我在 Linux/amd64 上使用了 OCaml 3.12.0 本机编译器。每个程序运行 5 次。

array: mean = 13.7 s, stddev = 0.24
CPS: mean = 11.4 s, stddev = 0.20

因此，即使在紧密循环中，CPS 的性能也比数组好得多。如果我们展开循环并将一条sub 指令替换为五条，则数字会发生变化：

array: mean = 5.28 s, stddev = 0.065
CPS: mean = 4.14 s, stddev = 0.309

有趣的是，这两种实现实际上都击败了 OCaml 字节码解释器。在我的机器上执行以下循环需要 17 秒：

for i = 500_000_000 downto 0 do () done

【讨论】：

有趣。这将如何与某种条件跳转或“if”操作码一起工作？
查看更新。 CPS 转换和基于 CPS 的解释器已被广泛研究，您可以找到比我幼稚的方法更好的解决方案，但它仍然有效。

【解决方案2】：

您不应该重新定义op_array，您应该通过在适当位置修改它来填写说明，使其与您的函数已经引用的op_array 相同。不幸的是，您不能在 OCaml 中动态更改数组的大小。

我看到了两种解决方案：

1) 如果您不需要更改“指令”的顺序，请在与数组op_array 的相互递归中定义它们。 OCaml 允许定义以构造函数的应用程序开始的相互递归的函数和值。比如：

let rec op_plus pc data = ...
and op_array = [| ... |]

2) 或者使用额外的间接：使op_array 成为对指令数组的引用，并在函数中引用 (!op_array).(pc+1)。稍后，在你定义完所有指令后，你可以让op_array 指向一个大小合适的数组，里面装满了你想要的指令。

let op_array = ref [| |] ;;
let op_plus pc data = ... ;;
op_array := [| ... |] ;;

【讨论】：

对于可调整大小的数组，可以使用 ExtLib.DynArray 或 res hg.ocaml.info/release/res/raw-file/release-3.2.0/README.txt>

【解决方案3】：

另一个选项（如果事先知道大小） - 最初用 void 指令填充数组：

let op_array = Array.create size (fun _ _ -> assert false)
let op_plus = ...
let () = op_array.(0) <- op_plus; ...

【讨论】：

这是我最终采用的方法，因为数组的大小是程序中指令的数量，并且大小是预先知道的。我还可以在解析过程中以编程方式填充数组，这是这种方法的一个优势。
实际上，虽然这在 REPL 中有效，但当我尝试使用 ocamlc 编译时它不起作用，我得到：错误：此表达式的类型，('_a -> '_b -> '_c ) 数组，包含无法从这一行概括的类型变量： let op_array = Array.create code_size (fun _ _ -> assert false) ;;
必须将其更改为： let op_array = Array.create code_size (fun (x:int) (y:int) -> Printf.printf "Done.\n" ) ;;有趣的是，另一个人在 REPL 工作。