【问题标题】：How to represent mapping between two trees in Haskell?如何在 Haskell 中表示两棵树之间的映射？
【发布时间】：2019-09-04 21:37:05
【问题描述】：

我正在尝试在 Haskell 中实现树处理算法，并且（由于这是我的第一个 Haskell 程序！），我正在努力设计数据结构。有哪位 FP 大师可以伸出援手吗？

我将从描述算法的重要特征开始，勾勒出我将如何使用命令式语言来解决这个问题，并以我迄今为止在 Haskell 中所做的蹒跚的婴儿步骤结束。

问题

我不会详细描述完整的算法，但要点如下：

该算法在两棵玫瑰树 X 和 Y 上运行。
算法的第一阶段根据每个节点及其后代的标签和属性计算每个节点的一些派生属性。
这些派生属性用于计算两棵树之间的部分映射，这样 X 中的节点可能与 Y 中的节点相关联，反之亦然。因为映射是部分的，所以 X 或 Y 中的任何节点都可以被映射（即在另一棵树上有一个伙伴），或者可以不映射。
算法的最后阶段通过检查映射节点的父/子/兄弟的一系列操作来优化这些映射。

因此，数据结构必须具有以下特征：

给定一个节点的引用，提供对该节点的父节点、该节点的兄弟节点以及该节点的子节点的访问。
给定输入树中的节点，允许使用附加信息（派生属性和对另一棵树中节点的可选引用）对该节点进行注释。

命令式解决方案草图

如果我要使用命令式语言来实现这个算法，解决方案将如下所示。

假设起点是输入树的如下定义：

struct node {
    // Identifier for this node, unique within the containing tree
    size_t id;

    // Label of this node
    enum label label;

    // Attributes of this node
    // An attribute can be assumed to be a key-value pair
    // Details of the attributes themselves aren't material to this
    // discussion, so the "attribute" type is left opaque
    struct attribute **attributes;
    size_t n_attributes;

    // Pointer to parent of this node
    // NULL iff this node is root
    struct node *parent;

    // Pointer to first child of this node
    // NULL iff this node is leaf
    struct node *child;

    // Double-linked list of siblings of this node
    struct node *prev;
    struct node *next;
};

每个节点中嵌入的指针明确支持算法所需的上/下/左/右遍历。

注解可以通过定义如下结构来实现：

struct algo_node {
    // Pointer to input node which has been wrapped
    struct node *node;

    // Derived properties computed by first phase of the algorithm
    // Details of the properties themselves aren't material to this
    // discussion, so the "derived" type is left opaque
    struct derived props;

    // Pointer to corresponding node in the other tree
    // NULL iff this node is unmatched
    struct node *match;
};

算法的第一阶段为每个输入树中的每个node 构造一个algo_node。

从algo_node 到node 的映射很简单：遵循嵌入的*node 指针。可以通过将algo_nodes 存储在一个数组中来支持另一个方向的映射，该数组由输入节点的id 索引。

这当然只是一种可能的实现方式。许多变化是可能的，包括

在list 或queue 接口后面抽象子链表，而不是存储三个原始指针
不是通过索引将输入树与算法树相关联，而是直接在struct algo_node 中编码父/子/兄弟关系

迁移到 Haskell

让我们从输入树的以下定义开始：

data Tree = Leaf Label Attributes
          | Node Label Attributes [Tree]

每个节点都有一个 id 的扩充可以实现如下：

data AnnotatedTree = Tree Int

addIndex :: Int -> AnnotatedTree -> (AnnotatedTree, Int)

indexedTree = addIndex 0 tree

同样，我们可以编写一个计算派生属性的函数：

data AnnotatedTree = Tree DerivedProperties

computeDerived :: DerivedProperties -> AnnotatedTree -> (AnnotatedTree, DerivedProperties)

derivedTree = computeDerived DefaultDerived tree

上面的 sn-ps 可以通过很少的工作进行调整，这样AnnotatedTree 既包含索引属性，也包含派生属性。

但是，我不知道从哪里开始表示两棵树之间的映射。根据一些阅读，我有一些不成熟的想法......

定义 AnnotatedTree 以包含从另一棵树的根到映射节点的路径 - 编码为每个连续子列表中的索引列表，[Integer]
- 使用拉链（我目前对它的理解相当松散）通过路径访问映射节点（或其父/子/兄弟）
- 或者也许使用镜头（...我对它的理解更不清晰！）也可以这样做
定义AnnotatedTree 直接包含对映射节点的引用，作为Maybe Tree
- 但是我看不到一种方法可以走到映射节点的父/兄弟节点

...但我真的可以提供一些指导，说明哪些（如果有的话）值得追求。

任何帮助将不胜感激！

【问题讨论】：

如果X中的一个节点x在Y中有一个对应的节点y，那么Y中所有与x的后代对应的节点也是@的后代987654347@?
@danidiaz 不，不一定。
我觉得拉链确实是你想要的。
是否值得将我的数据转换为 Data.Tree 以便我可以使用 Data.Tree.Zipper？还是我应该只实现自己的拉链？两条路线上是否有任何我应该注意的问题？

标签： algorithm haskell tree mapping abstract-syntax-tree

【解决方案1】：

您可以用Int id 标记树节点，然后用拉链在它们周围走动（使用Data.Tree 和Data.Tree.Zipper 是个好主意，因为不需要重新发明轮子）。然后，您可以使用 Data.IntMap 将辅助属性附加到节点，以将节点 ID 映射到您想要的任何内容。特别是，您可以创建一个 IntMap 以从节点 id 映射到该节点的 TreePos Full Int，以便您可以探索该节点的父节点、兄弟节点和子节点。

【讨论】：