如何找到具有 XPath 和偏移量的 xml 文件的确切位置？答案

【问题标题】：How can I go to exact position of xml file having its XPath and Offset?如何找到具有 XPath 和偏移量的 xml 文件的确切位置？
【发布时间】：2016-09-13 19:53:58
【问题描述】：

我正在使用 lxml 将 xml 文件解析为 ElementTree 对象。我正在构建注释应用程序，我需要到达文件中的确切位置。我有相对的 XPath 和 startOffset 预期文本所在的位置。例如在这段代码中：

<section role="doc-abstract">
    <h1>Abstract</h1>
    <p>The creation and use of knowledge graphs for information discovery, question answering, and task completion has exploded in recent years, but their application has often been limited to the most common user scenarios.</p>
</section>

我想通过 XPath ".//section[2]/p[1]" 获得“信息发现的知识图”部分，这样我就可以访问那个 <p> 元素。然后我有 startOffset 变量等于“26”，这意味着文本距离元素开头有 26 个字符。我的问题是如何使用 lxml 到达那个确切的位置？

【问题讨论】：

不只是一个切片：p.text[startOffset:]?..p 是你定位的元素吗？
没错。但这会返回一个字符串。我需要在那个位置创建一个元素，那么有什么方法可以返回一个 Element 对象以便我可以使用 insert() 和 insert_before() 方法？
好的，我创建了新元素，并将我的文本作为 Element.text，然后将实际文本替换为 etree.tostring(Element.text) 以将该文本包装在标签中。感谢您的提示。

标签： python xml xpath lxml

【解决方案1】：

考虑将您的 xml 存储在字符串中 - xml_string。

from lxml import etree

#initialize a parser
parser = etree.XMLParser(remove_blank_text=True)
#initialize the xml root, it will automatically take the root of the xml
root = etree.XML(xml_string, parser)
node = root.find('//section[2]/p[1]')

现在你可以处理这个节点了。此外，您可以使用循环来查找更多节点元素，例如：root.findall()

有关 lxml 的更多参考：https://lxml.de/tutorial.html

【讨论】：