Xpath 从 XML 中获取没有子节点的特定节点答案

【问题标题】：Xpath fetch specific nodes without their child nodes from XMLXpath 从 XML 中获取没有子节点的特定节点
【发布时间】：2015-05-06 09:52:58
【问题描述】：

我有这样的 XML 数据

<priceData>
  <div class='price'>
    <div class='price-old'>20.00</div>
    <div class='price-new'>10.00</div>
    <div class='price-tax'>8.00</div>
  </div>
  <div class='price'>
    40.00 <div class='price-tax'>25.00</div>
  </div>
 </priceData>

我想使用 Xpath 从第一个价格 div 中提取“price-new”的数据，并从第二个价格 div 中提取 40.00 的数据。这必须使用单个表达式来完成。

我试过这样的表达方式

//div[contains(@class, 'price') and not(contains(@class, 'tax')) and not(contains(@class, '-old'))]

和

//div[contains(@class, 'price') and not(contains(@class, 'tax')) and not(descendant::div[contains(@class, '-old') and not(contains(@class, '-tax'))]) and not(contains(@class, '-old'))]

还有其他一些，但我无法让它按应有的方式工作。我总是从第一个案例中获取额外的节点，而我只需要单个节点（如果其中没有更多节点，则为 price-new 或 price）。

【问题讨论】：

你到底需要什么结果？具有两个文本节点的节点集？
是的，两个文本节点。每个案例一个。

标签： xpath web-crawler

【解决方案1】：

您可以尝试使用 xpath union (|) 将 2 个查询合并为一个。给定问题中的标记作为 XML 输入，以下 xpath（为便于阅读而格式化）：

//div[@class='price']/div[@class='price-new']/text()
    | 
//div[@class='price']/text()[normalize-space()]

在xpath tester 中返回“预期”结果：

Text='10.00'
Text='40.00'

【讨论】：