如何在 OCaml 中折叠递归树答案

【问题标题】：How to collapse a recursive tree in OCaml如何在 OCaml 中折叠递归树
【发布时间】：2020-03-19 17:22:47
【问题描述】：

我有一个树类型：

type tree = Vertex of int * tree list;;

我的递归相等定义是，如果两棵树的整数相等且所有子树都相等，则两棵树相等。

如何构建函数

topo: tree -> tree list

创建深度优先搜索顺序的所有树的列表，每棵树出现一次且仅出现一次（根据相等定义）？我想以一种计算有效的方式来做到这一点。也许使用惰性或哈希图？

这是我的尝试，长度太大时代码会爆炸：

type tree = Vertex of int * (tree list)

let rec base = function
    | 0 -> Vertex (0, [])
    | i -> Vertex (i, [base (i - 1)])

let rec range = function
    | 0 -> [0]
    | i -> i :: range (i - 1)

let agg i = Vertex (-1, List.map base (range i))

let rec equals (a: tree) (b: tree) : bool =
    let rec deep_match a_dep b_dep = match a_dep, b_dep with
        | [], []       -> true
        | [], _
        | _, []        -> false
        | x::xs, y::ys -> equals x y && deep_match xs ys
    in
    let Vertex (ai, al) = a in
    let Vertex (bi, bl) = b in
    ai = bi && deep_match al bl

let rec in_list (a: tree) (l: tree list) : bool = match l with
    | [] -> false
    | hd::tl -> equals a hd || in_list a tl

let rec topological (pool: tree list) (t: tree) : tree list =
    if in_list t pool then pool else
        t::match t with
            | Vertex(_, []) -> pool
            | Vertex(_, deps) -> List.fold_left topological pool deps

let big_agg = agg 100_000
let topo_ordered = topological [] big_agg;;
Printf.printf "len %i\n" (List.length topo_ordered)

【问题讨论】：

你自己尝试过什么，你在哪里卡住了？ Stack Overflow 不是一个代码编写服务，而是一个获得特定编程问题帮助的地方。请参阅 How to Ask 以及 help center 中的主题内容。
我尝试研究如何懒惰地执行此操作，并使用哈希映射执行此操作，但我被卡住了。在 C 语言中这似乎是一个简单的问题：返回指针并记住最递归的树生成器。然后用使用指针的字典折叠。我看不到如何在 OCaml 中做类似的事情
所以假设我的 DAG 有 n 个“基”节点和一个“父节点”。假设从底数 n 到底数 n - 1 有一条边，一直到底数 0 没有边。然后假设从父节点到每个基节点都有一条边。即使这个图只有 n + 1 个顶点和 2n - 1 条边，路径的数量随着 n ^ 2 的增长而增长。当我尝试使用直接相等运算符编写这个函数时，程序完成所花费的时间会爆炸，而你不能用 n=100000 来评估它。
我已经添加了我的尝试！
我要问的问题与并行性无关。问题是在编写代码时爬取树所花费的时间远远超过线性时间。目标是找到一种在 OCaml 中编写代码的方法，使得它只需要线性时间。

标签： functional-programming ocaml

【解决方案1】：

为了获得最佳性能，一种可能性是使用hashconsing。但是，在您当前的示例中，生成和唯一性测试在n 中都是二次的。修复这两点似乎已经大大提高了性能。

首先，我们可以通过添加大量共享来避免二次树的生成：

let range max =
  let rec range elt l n =
    if n > max then elt::l
    else
      let next = Vertex(n,[elt]) in
      range next (elt::l) (n+1) in
  range (Vertex(0,[])) [] 1

let agg i = Vertex (-1, range i)

通过这一更改，生成具有 10¹⁰ 个元素（但只有 10⁵ 个唯一元素）的树变得合理。然后，可以使用集合（或哈希表）进行唯一性测试：

module S = Set.Make(struct type t = tree let compare = compare end)

let rec topological (set, pool) t =
    if S.mem t set then (set, pool) else
      let set = S.add t set in
      let set, pool =
        match t with
        | Vertex(_, []) -> set, pool
        | Vertex(_, deps) -> List.fold_left topological (set,pool) deps in
      set, t::pool

【讨论】：

只是为了明确一点，因为compare 在树的大小上是 O(N)，所以 Set.add 函数将是 O(N^2) 并且 topological 是 O( N^3)。使用原始的in_list 可能会更好。它也是 O(N^2)，但具有更小的常数因子（和更好的内存表示）。
这种情况下比较函数肯定是不正常的，但是既然集合的树的高度在ln(N)中，不应该在O(N ln N)中插入Set.add吗？并且凭经验使用 Set 确实可以加快执行速度（提高 200 倍）。
加速是由于共享，因为多态比较在递归更深之前寻找物理身份和数据的意外排序:) 在您的情况下插入是N* N * log N，即O(N^2)。要将元素插入到列表中，您需要进行O(N*logN) 比较，其中每个比较采用O(N)，因此O(N) * O(N*logN) = O(N^2)。 Set.mem 给了我们O(logN) 比较，每个比较都有O(N)，所以总共O(N*logN) 包含在插入中。最后，topological 做了N 插入，所以它是O(N)*O(N^2) = O(N^3)
啊对，在这种情况下，成本主要是通过识别等式来决定的，因此实际上是物理上的等式捷径很重要。但是，我仍然不确定插入的 N ln N 比较来自哪里？您必须从头开始对整棵树进行排序才能进行那么多比较？对于读者来说，@ivg 提出的答案要好得多。
哦抱歉，当然插入的只是O(logN)，从列表中插入所有的树，会拿O(NlogN)比较，每个O(N)本身，这样既复杂度topological和S.add 将由O(N^2) 界定，而不是O(N^3)。

【解决方案2】：

为了提高效率，您需要实现排序和哈希计算。通过全排序，您可以将树存储在平衡树甚至哈希表中，从而将您的 in_list 变成 O(logN) 甚至 O(1)。添加 hash-consing 将使您的树的 O(1) 比较成为可能（以较低效率的树构造为代价）。

根据您的设计限制，您可以只拥有一个，而不是两者兼有。出于教学目的，让我们为您的特定表示实现 hash-consing

要实现 hash-consing，您需要将构造函数设为私有并将数据构造函数隐藏在抽象墙后面（以防止用户破坏您的 hash-consing 属性）：

module Tree : sig
  type t = private Vertex of int * t list

  val create : int -> t list -> t
  val equal : t -> t -> bool
end  = struct
  type t = Vertex of int * t list

  let repository = Hashtbl.create 64

  let create n children =
    let node = Vertex (n,children) in
    try Hashtbl.find repository node
    with Not_found -> Hashtbl.add repository node node; node
  let equal x y = x == y
end

由于我们保证在树创建期间结构相等的树在物理上是相等的（即，如果我们的存储库中存在相等的树，则我们将其返回），我们现在可以用物理相等代替结构相等，即指针比较。

我们快速比较了价格——我们现在泄漏了内存，因为我们需要存储所有曾经创建的树并且创建函数现在是 O(N)。我们可以通过使用 ephemerons 来缓解第一个问题，但当然，后一个问题仍然存在。

另一个问题是，我们无法将树放入有序结构中，例如地图或集合。我们当然可以使用常规的多态比较，但是因为它会是 O(N)，所以插入到这样的结构中会变成二次的。不是我们的选择。因此，我们需要在我们的树上添加总排序。理论上我们可以在不改变表示的情况下做到这一点（使用 ephemerons），但是在我们的树表示中添加一个 order 参数会更容易，例如，

module Tree : sig
  type order (* = private int *) (* add this for debuggin *)
  type t = private Vertex of order * int * t list

  val create : int -> t list -> t
  val equal : t -> t -> bool
  val compare : t -> t -> int
end = struct
  type order = int
  type t = Vertex of order * int * t list
  type tree = t

  module Repository = Hashtbl.Make(struct
      type t = tree
      let max_hash = 16

      let rec equal (Vertex (_,p1,x)) (Vertex (_,p2,y)) =
        match compare p1 p2 with
        | 0 -> equal_trees x y
        | n -> false
      and equal_trees xs ys = match xs, ys with
        | [],[] -> true
        | [],_ | _,[] -> false
        | x :: xs, y::ys -> equal x y && equal_trees xs ys
      let rec hash (Vertex (_,p,xs)) =
        hash_trees (Hashtbl.hash p) max_hash xs
      and hash_trees hash depth = function
        | x :: xs when depth > 0 ->
          hash_trees (Hashtbl.hash x) (depth-1) xs
        | _ -> hash
    end)

  let repository = Repository.create 64

  let create n children =
    try Repository.find repository (Vertex (0,n,children))
    with Not_found ->
      let order = Repository.length repository + 1 in
      let node = Vertex (order,n,children) in
      Repository.add repository node node; node

  let equal x y = x == y
  let order (Vertex (order,_,_)) = order
  let compare x y = compare (order x) (order y)

end

我们必须为我们的树手动实现equal 和hash 的结构变体，因为当我们在存储库中存储新树时，我们需要忽略比较顺序。这看起来有点工作，但在现实生活中，您可以使用派生程序来做到这一点。

无论如何，现在我们得到了一个具有 O(1) 比较函数的可比较版本的树，因此我们可以将我们的树放入集合和地图中，并有效地实现您的拓扑。

这两种实现的一个很好的特性是树的紧密表示，因为create 函数保证了共享。例如，

# let t1 = Tree.create 42 [];;
val t1 : Tree.t = Tree.Vertex (1, 42, [])
# let t3 = Tree.create 42 [t1; t1];;
val t3 : Tree.t =
  Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])])
# let t5 = Tree.create 42 [t1; t3; t1];;
val t5 : Tree.t =
  Tree.Vertex (3, 42,
   [Tree.Vertex (1, 42, []);
    Tree.Vertex (2, 42, [Tree.Vertex (1, 42, []); Tree.Vertex (1, 42, [])]);
    Tree.Vertex (1, 42, [])])
#

在本例中，t5 和 t3 中的 t1 将是同一个指针。

【讨论】：