节点包含多行文本时特定文本行的 xPath答案

【问题标题】：xPath for a specific line of text when the node contains multiple lines of text节点包含多行文本时特定文本行的 xPath
【发布时间】：2021-12-16 14:39:22
【问题描述】：

我有一个看起来像这样的 html：

<div class='textContainer'>
    <div class='textLabel'> </div>
    <div class='text'> 
    "First Line of text" 
    "Second Line of text" 
    "Third line of text" 
    </div>
</div>

我可以轻松地创建一个定位器来查找包含文本的节点，但是我需要专门针对文本的第一行和第三行运行一个断言......所以，我需要特定的定位器。喜欢

//div[@class='text']/text[1]
//div[@class='text']/text[3]

这有可能吗？

任何帮助将不胜感激。

谢谢！

【问题讨论】：

可能，是的。但是提取文本行很繁琐。我相信您可以在换行符周围使用 substring-before() 和 substring-after() 来提取每一行。

标签： html css xpath

【解决方案1】：

您可以使用 XPath 2 或 3 来做到这一点，例如在浏览器或带有 Saxon-JS 2 的 Node.js 中，您有 XPath 3.1 支持：

const lines = SaxonJS.XPath.evaluate(`//div[@class = 'text']/tokenize(., '\n')[normalize-space()]!normalize-space()`, document, { xpathDefaultNamespace : 'http://www.w3.org/1999/xhtml' });

console.log(lines);
console.log(lines[0]);

<script src="https://www.saxonica.com/saxon-js/documentation/SaxonJS/SaxonJS2.rt.js"></script>

<div class='textContainer'>
    <div class='textLabel'> </div>
    <div class='text'> 
    "First Line of text" 
    "Second Line of text" 
    "Third line of text" 
    </div>
</div>

请注意，在任何版本的 XPath 或 DOM 中，规范化树都有一个文本节点，但在 XPath 2 或更高版本中，您可以将文本节点的字符串拆分或标记为字符串序列，并处理序列中的每个字符串。 Saxon-JS 2 API to JavaScript 很好地为您提供了 XPath 3.1 字符串序列作为 JavaScript 中的字符串数组。

就 XPath 2 或 3 数据模型而言，路径表达式 //div[@class = 'text']/tokenize(., '\n')[normalize-space()]!normalize-space() 给出了一个字符串序列，您可以像通常在 XPath 中一样使用整数进行位置索引，因此 let $lines := //div[@class = 'text']/tokenize(., '\n')[normalize-space()]!normalize-space() return $lines[2] 返回序列中的第二个项目/第二个字符串字符串（文本节点的规范化文本行）。

【讨论】：