XSLT：合并一组树层次结构答案

【问题标题】：XSLT: Merge a set of tree hierarchiesXSLT：合并一组树层次结构
【发布时间】：2010-10-16 16:01:13
【问题描述】：

我有一个基于 Excel 在另存为“XML Spreadsheet 2003 (*.xml)”时生成的 XML 文档。

电子表格本身包含一个带有标签层次结构的标题部分：

| A B C D E F G H I -+------------------------------------------------ ----- 1| a1 a2 2| a11 a12 a13 a21 a22 3| a111 a112 a121 a122 a131 a132 a221 a222

这种层次结构存在于工作簿的所有工作表上，并且在任何地方看起来都差不多。

Excel XML 的工作方式与普通的 HTML 表格完全一样。（<row>s 包含 <cell>s）。我已经能够把所有东西都变成这样的树形结构了：

<node title="a1" col="1">
  <node title="a11" col="1">
    <node title="a111" col="1"/>
    <node title="a112" col="2"/>
  </node>
  <node title="a12" col="3">
    <node title="a121" col="3" />
    <node title="a122" col="4" />
  </node>
  <!-- and so on -->
</node>

但这里是复杂的：

工作表不止一个，因此每个工作表都有一棵树
每张表的层次结构可能略有不同，树不会相等（例如，表 2 可能有“a113”，而其他则没有）
树的深度没有明确限制
然而，标签在所有工作表中都是相同的，这意味着它们可以用于分组

我想将这些独立的树合并成一个如下所示的树：

<node title="a1">
  <col on="sheet1">1</col>
  <col on="sheet2">1</col>
  <node title="a11">
    <col on="sheet1">1</col>
    <col on="sheet2">1</col>
    <node title="a111">
      <col on="sheet1">1</col>
      <col on="sheet2">1</col>
    </node>
    <node title="a112">
      <col on="sheet1">2</col>
      <col on="sheet2">2</col>
    </node>
    <node title="a113"><!-- different here -->
      <col on="sheet2">3</col>
    </node>
  </node>
  <node title="a12">
    <col on="sheet1">3</col>
    <col on="sheet2">4</col>
    <node title="a121">
      <col on="sheet1">3</col>
      <col on="sheet2">4</col>
    </node>
    <node title="a122">
      <col on="sheet1">4</col>
      <col on="sheet2">5</col>
    </node>
  </node>
  <!-- and so on -->
</node>

理想情况下，我希望能够在之前进行合并，我什至从 Excel XML 构建了三个结构（如果你让我开始这样做，那就太好了）。但由于我不知道如何做到这一点，所以在构建树之后进行合并（即：上述情况）就可以了。

感谢您的宝贵时间。 :)

【问题讨论】：

能否请您也提供第二个xml文件，以便我看看这个问题？谢谢，

标签： xslt grouping hierarchy merge

【解决方案1】：

这是 XSLT 1.0 中一种可能的解决方案：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/*">
      <t>
        <xsl:apply-templates
           select="node[@title='a1'][1]">
          <xsl:with-param name="pOther"
            select="node[@title='a1'][2]"/>
        </xsl:apply-templates>
      </t>
    </xsl:template>

    <xsl:template match="node">
      <xsl:param name="pOther"/>

      <node title="{@title}">
        <col on="sheet1">
          <xsl:value-of select="@col"/>
        </col>
          <xsl:choose>
            <xsl:when test="not($pOther)">
              <xsl:apply-templates mode="copy">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>
            </xsl:when>
            <xsl:otherwise>
              <col on="sheet2">
                <xsl:value-of select="$pOther/@col"/>
              </col>
              <xsl:for-each select=
                "node[@title = $pOther/node/@title]">

                <xsl:apply-templates select=".">
                  <xsl:with-param name="pOther" select=
                   "$pOther/node[@title = current()/@title]"/>
                </xsl:apply-templates>
              </xsl:for-each>

              <xsl:apply-templates mode="copy" select=
                "node[not(@title = $pOther/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet1'"/>
              </xsl:apply-templates>

              <xsl:apply-templates mode="copy" select=
                "$pOther/node[not(@title = current()/node/@title)]">
                <xsl:with-param name="pSheet" select="'sheet2'"/>
              </xsl:apply-templates>
            </xsl:otherwise>
          </xsl:choose>
      </node>
    </xsl:template>

    <xsl:template match="node" mode="copy">
      <xsl:param name="pSheet"/>

      <node title="{@title}">
        <col on="{$pSheet}">
          <xsl:value-of select="@col"/>
        </col>

        <xsl:apply-templates select="node" mode="copy">
          <xsl:with-param name="pSheet" select="$pSheet"/>
        </xsl:apply-templates>
      </node>
    </xsl:template>
</xsl:stylesheet>

当对这个 XML 文档应用上述转换时（两个 XML 文档在一个公共顶部节点下的连接——留给读者作为练习:)）：

<t>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
        </node>
        <node title="a12" col="3">
            <node title="a121" col="3" />
            <node title="a122" col="4" />
        </node>
        <!-- and so on -->
    </node>
    <node title="a1" col="1">
        <node title="a11" col="1">
            <node title="a111" col="1"/>
            <node title="a112" col="2"/>
            <node title="a113" col="3"/>
        </node>
        <node title="a12" col="4">
            <node title="a121" col="4" />
            <node title="a122" col="5" />
        </node>
        <!-- and so on -->
    </node>
</t>

产生了想要的结果：

<t>
    <node title="a1">
        <col on="sheet1">1</col>
        <col on="sheet2">1</col>
        <node title="a11">
            <col on="sheet1">1</col>
            <col on="sheet2">1</col>
            <node title="a111">
                <col on="sheet1">1</col>
                <col on="sheet2">1</col>
            </node>
            <node title="a112">
                <col on="sheet1">2</col>
                <col on="sheet2">2</col>
            </node>
            <node title="a113">
                <col on="sheet2">3</col>
            </node>
        </node>
        <node title="a12">
            <col on="sheet1">3</col>
            <col on="sheet2">4</col>
            <node title="a121">
                <col on="sheet1">3</col>
                <col on="sheet2">4</col>
            </node>
            <node title="a122">
                <col on="sheet1">4</col>
                <col on="sheet2">5</col>
            </node>
        </node>
    </node>
</t>

请注意以下几点：

我们假设两个顶部的node 元素都将"a1" 作为其title 属性的值。这很容易概括。
匹配node的模板有一个名为pOther的参数，它是另一个文档中名为node的对应元素。仅当 $pOther 存在时才应用此模板。
当不存在名为node 的对应元素时，将应用另一个模板，也匹配node，但在copy 模式下。该模板有一个名为pSheet的参数，其值为该元素所属的工作表名称（字符串）。

【讨论】：

@Dimitre：我今天的时间有点短，会尽快回复您的解决方案。不要不耐烦。 :)

【解决方案2】：

一个以工作表编号作为参数的可调用模板如何检查输入并返回正确的“col”节点，如果它出现在该工作表的 XML 中，如果没有，则返回任何内容。在每个节点上，为每个工作表调用一次。

要合并树，可能是一个模板，它在任何工作表中查找当前节点的所有子节点，并为每个子节点递归。

抱歉没有示例代码，我发现编写 XSLT 非常慢，可能是因为我不经常这样做。所以我很可能错过了一些重要的事情。但是将它们放在一起会得到类似的结果：

获取“/node”的标题。有了这个标题：
- 在所有工作表中搜索此标题，为每个工作表发出“col”节点
- 在所有工作表中搜索具有此标题的节点的子节点（丢弃重复项）
- 递归每个标题。

以下是一些用于以各种方式删除重复项的 sn-ps：

http://www.dpawson.co.uk/xsl/sect2/N2696.html

读取多个文档取决于处理器，但如果所有其他方法都失败了，使用任何旧的脚本语言可能会做一些剪切和粘贴，前提是您知道它们都将具有相同的编码，请不要使用id 冲突等等。

【讨论】：