【问题标题】:How can I get the next child in BeautifulSoup如何在 BeautifulSoup 中获得下一个孩子
【发布时间】:2020-12-15 05:03:03
【问题描述】:

这是我拥有的 HTML 和代码:

<a class="card__article-link" href="linktoarticle" title="articletitle">
<span class="card__egida">TEXT</span>
<span class="card__title ">TITLE</span>
<span class="card__subtitle">SUBTITLE</span>
</a>

import requests
from bs4 import BeautifulSoup
r = requests.get("link").text
soup = BeautifulSoup(r, "html.parser")

for span in soup.find_all("span", {"class": "card__egida"}):
    print(span.get_text())

代码正确打印 TEXT,但我希望代码也打印 TITLE 和 SUBTITLE。我曾尝试使用 nextSibling 但没有成功。 我该怎么做?

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup python-requests


    【解决方案1】:

    您可以使用.find_next() 获取下一个元素:

    from bs4 import BeautifulSoup
    
    
    txt = '''<a class="card__article-link" href="linktoarticle" title="articletitle">
    <span class="card__egida">TEXT</span>
    <span class="card__title ">TITLE</span>
    <span class="card__subtitle">SUBTITLE</span>
    </a>'''
    
    soup = BeautifulSoup(txt, 'html.parser')    
    
    for span in soup.find_all("span", {"class": "card__egida"}):
        egida = span.get_text()
        title = span.find_next(class_='card__title').get_text()
        subtitle = span.find_next(class_='card__subtitle').get_text()
    
        print(egida)
        print(title)
        print(subtitle)
    

    打印:

    TEXT
    TITLE
    SUBTITLE
    

    或者:可以选择父&lt;a&gt;,然后搜索标题、副标题等...:

    for a in soup.select('a.card__article-link'):
        egida = a.select_one('.card__egida').get_text()
        title = a.select_one('.card__title').get_text()
        subtitle = a.select_one('.card__subtitle').get_text()
    

    【讨论】:

      猜你喜欢
      • 2015-12-26
      • 1970-01-01
      • 1970-01-01
      • 2014-01-20
      • 2018-04-28
      • 2018-05-05
      • 1970-01-01
      • 2018-03-16
      相关资源
      最近更新 更多