【问题标题】:BeautifulSoup XML: finding elements by sibling element's textBeautifulSoup XML:通过兄弟元素的文本查找元素
【发布时间】:2018-12-25 23:35:43
【问题描述】:

following example 中,我想查找价格为 8.99 的所有书名。换句话说,我想根据兄弟元素的文本来查找元素的文本。

from bs4 import BeautifulSoup
XML = """<?xml version="1.0">
<library>
    <book>
        <title>The Cat in the Hat</title>
        <author>Dr. Seuss</author>
        <price>7.35</price>
    </book>
    <book>
        <title>Ender's Game</title>
        <author>Orson Scott Card</author>
        <price>8.99</price>
    </book>
    <book>
        <title>Prey</title>
        <author>Michael Crichton</author>
        <price>8.99</price>
    </book>
</library>
"""
soup = BeautifulSoup(XML, "xml")

令人惊讶的是,查询soup.find({"price": 8.99}).parent 会返回错误的书:

<book>
<title>The Cat in the Hat</title>
<author>Dr. Seuss</author>
<price>7.35</price>
</book>

更新

查询[x.parent.find("title").text for x in soup.find_all("price", text = 8.99)] 返回列表["Ender's Game", "Prey"],这正是我想要的。但这是最好的方法吗?

【问题讨论】:

    标签: python xml beautifulsoup xml-parsing


    【解决方案1】:

    您可以使用find_previous_sibling()

    from bs4 import BeautifulSoup
    XML = """<?xml version="1.0">
    <library>
        <book>
            <title>The Cat in the Hat</title>
            <author>Dr. Seuss</author>
            <price>7.35</price>
        </book>
        <book>
            <title>Ender's Game</title>
            <author>Orson Scott Card</author>
            <price>8.99</price>
        </book>
        <book>
            <title>Prey</title>
            <author>Michael Crichton</author>
            <price>8.99</price>
        </book>
    </library>
    """
    soup = BeautifulSoup(XML, "xml")
    
    prices = soup.find_all("price", text=8.99)
    for price in prices:
        title = price.find_previous_sibling('title')
        print(title)
    
    # and with list comprehension
    titles = [price.find_previous_sibling('title').text for price in prices]                                                                                                                                                        
    print(titles)
    

    输出

    <title>Ender's Game</title>
    <title>Prey</title>
    
    # List comprehension
    ["Ender's Game", 'Prey']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-04-05
      • 1970-01-01
      • 2020-09-27
      • 1970-01-01
      • 1970-01-01
      • 2016-08-26
      相关资源
      最近更新 更多