XPath 每个元素的最后一次出现答案

【问题标题】：XPath last occurrence of each elementXPath 每个元素的最后一次出现
【发布时间】：2011-07-05 09:57:44
【问题描述】：

我有类似的 XML

<root>
    <a>One</a>
    <a>Two</a>
    <b>Three</b>
    <c>Four</c>
    <a>Five</a>
    <b>
        <a>Six</a>
    </b>
</root>

并且需要选择根中任何子节点名称的最后一次出现。在这种情况下，所需的结果列表将是：

<c>Four</c>
<a>Five</a>
<b>
    <a>Six</a>
</b>

感谢任何帮助！

【问题讨论】：

我认为这不可能使用单个 XPath 1.0 单行。
好问题，+1。请参阅我的 amswer，以获得比当前选择的解决方案更完整、更简短且效率更高的解决方案。还提供了解释。
还添加了一个非常短的 XPath 2.0 单行。
这个问题得到了很好的回答+1

标签： xml xslt xpath

【解决方案1】：

XPath 2.0 解决方案和当前接受的答案都非常低效 (O(N^2))。

此解决方案具有亚线性复杂度：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kElemsByName" match="/*/*"
  use="name()"/>

 <xsl:template match="/">
  <xsl:copy-of select=
    "/*/*[generate-id()
         =
          generate-id(key('kElemsByName', name())[last()])
         ]"/>
 </xsl:template>
</xsl:stylesheet>

应用于提供的 XML 文档时：

<root>
    <a>One</a>
    <a>Two</a>
    <b>Three</b>
    <c>Four</c>
    <a>Five</a>
    <b>
        <a>Six</a>
    </b>
</root>

产生想要的正确结果：

<c>Four</c>
<a>Five</a>
<b>
   <a>Six</a>
</b>

说明：这是Muenchian grouping 的修改变体——所以不是第一个。但每组中的最后一个节点被处理。

II XPath 2.0 单线：

用途：

/*/*[index-of(/*/*/name(), name())[last()]]

使用 XSLT 2.0 作为 XPath 2.0 主机进行验证：

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:sequence select=
    "/*/*[index-of(/*/*/name(), name())[last()]]"/>
 </xsl:template>
</xsl:stylesheet>

当对同一个 XML 文档（前面提供）应用此转换时，会产生相同的正确结果：

<c>Four</c>
<a>Five</a>
<b>
    <a>Six</a>
</b>

【讨论】：

+1。这个答案必须被赞成，因为它尊重其他人。
除了关于表达式计算复杂性的推理，您能推荐一些其他 XSLT 技术来衡量性能吗？
@empo：仅在 XSLT 中无法进行时间测量。当前时间和当前日期时间的标准 XPath 2.0 函数在同一转换期间任何时候引用它们时总是产生相同的结果。这是因为它们是稳定的函数——就像函数式语言的所有函数一样。当然，在调用 XSLT 转换的应用程序中进行时间测量很容易。

【解决方案2】：

如果你可以使用 XPath 2.0，这将起作用

/root//*[not(name() = following-sibling::*/name())]

【讨论】：

XPath 2.0 正确 +1。在这种情况下，你可以写得更好/*/*[not(name() = following-sibling::*/name())]。
库不支持 XPath 2.0，否则我很乐意使用它！（应该指定）

【解决方案3】：

基于 XSLT 的解决方案：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="root/*">
        <xsl:variable name="n" select="name()"/>
        <xsl:copy-of
            select=".[not(following-sibling::node()[name()=$n])]"/>
    </xsl:template>
</xsl:stylesheet>

产生的输出：

<c>Four</c>
<a>Five</a>
<b>
   <a>Six</a>
</b>

第二种解决方案（您可以将其用作单个 XPath 表达式）：

<xsl:template match="/root">
    <xsl:copy-of select="a[not(./following-sibling::a)]
        | b[not(./following-sibling::b)]
        | c[not(./following-sibling::c)]"/>
</xsl:template>

【讨论】：

建议的单一 XPath 不适用于未知子名称或大量子名称的情况。
谢谢。根据@empo 评论，第二个 XPath 在一般情况下不起作用，但我（不情愿地）使用了 XSLT 解决方案，它运行良好。单一的 XPath 查询会很好，因为 :)
对你有好处。下次您可能还会考虑在问题中添加 XSLT 标记 ;-) ...因此人们可能会考虑添加替代答案。
@empo：你是对的。没找到通用的表达方式，就照上面的发了。
@Grzegorz Szpetkowski：您可能有兴趣看到在大型 XML 文档上比您的解决方案快数百甚至数千倍的另一种解决方案。

【解决方案4】：

现在，XSLT 2.0 为这些问题提供了grouping techniques：

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes" />
    <xsl:strip-space elements="*" />

    <xsl:template match="/root">
        <xsl:for-each-group select="*" group-by="name()">
            <!-- <xsl:sort select="index-of(/root/*, current-group()[last()])" order="ascending"/> -->
            <xsl:copy-of select="current-group()[last()]" />
        </xsl:for-each-group>
    </xsl:template>
</xsl:stylesheet>

将产生：

<a>Five</a>
<b>
  <a>Six</a>
</b>
<c>Four</c>

除非受到<xsl:sort>! 的明确影响，否则按文档顺序进行分组！

【讨论】：