使用基于文本字符串的 lxml 解析 XML 文件答案

【问题标题】：Parsed a XML file using lxml based of text string使用基于文本字符串的 lxml 解析 XML 文件
【发布时间】：2016-06-08 14:14:27
【问题描述】：

我有一个 XML 文件，我想根据字符串检索元素的 text 属性。

在下面的示例中，我想查找包含字符串 home 的所有主题元素（两个元素）。获得元素后，我可以检索 text 值。

<?xml version="1.0" ?>
<zAppointments reminder="15">
    <appointment>
        <subject>Bring pizza home</subject>
        <shape>circule</shape>
    </appointment>
    <appointment>
        <subject>Bring hamburger home</subject>
        <shape>box</shape>
    </appointment>
    <appointment>
        <subject>Bring banana homes</subject>
    </appointment>
    <appointment>
        <subject>Check MS Office website for updates</subject>
  </appointment>
</zAppointments>

【问题讨论】：

标签： python python-3.x lxml

【解决方案1】：

使用contains() XPath 函数：

//subject[contains(., 'home')]/text()

演示：

>>> import lxml.etree as ET
>>>
>>> data = """<?xml version="1.0" ?>
... <zAppointments reminder="15">
...     <appointment>
...         <subject>Bring pizza home</subject>
...     </appointment>
...     <appointment>
...         <subject>Bring hamburger home</subject>
...     </appointment>
...     <appointment>
...         <subject>Check MS Office website for updates</subject>
...   </appointment>
... </zAppointments>"""
>>> root = ET.fromstring(data)
>>> root.xpath("//subject[contains(., 'home')]/text()")
['Bring pizza home', 'Bring hamburger home']

【讨论】：

感谢您的回答。是否可以返回标签文本的元素？因为我也想控制 shape 的值，以防我在元素 appointment 中找到字符串 home
@Eagle 是的，您可以通过 //subject[contains(., 'home')] 表达式遍历元素。然后，从.text 属性中获取文本..