【问题标题】:Extract Link and Title Within a Heading Tag with bs4使用 bs4 提取标题标签中的链接和标题
【发布时间】:2021-05-25 09:50:02
【问题描述】:

我使用了以下代码:

from bs4 import BeautifulSoup
import requests
page = requests.get(
    "https://www.olivemagazine.com/recipes/entertain/best-ever-starter-recipes/")

soup = BeautifulSoup(page.content, 'html.parser')


for i in soup.find_all('h3')[1:-3]:
    print(i)

要获得这种输出:

<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/meat-and-poultry/summer-deli-board/" rel="noopener" target="_blank">Summer deli board</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/entertain/marinated-figs-with-mozzarella-and-serrano-ham/" rel="noopener" target="_blank">Marinated figs with mozzarella and serrano ham</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomato-salad-with-burrata-and-warm-nduja-dressing/">Tomato salad with burrata and warm 'nduja dressing</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/quick-and-easy/griddled-avocados-with-crab-and-chorizo/" rel="noopener" target="_blank">Griddled avocados with crab and chorizo</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/duck-chicken-and-sour-cherry-terrine/">Duck, chicken and sour cherry terrine</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/steak-tartare/3000.html" target="_self">Steak tartare</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomatoes-and-lardo-on-toast-with-basil-oil/">Tomatoes and lardo on toast with basil oil</a></h3>

我想从这里提取锚标记中的链接以及显示名称,例如 Summer Deli board。

我不知道如何从我目前得到的地方提取这两个元素。

【问题讨论】:

    标签: python python-3.x web-scraping beautifulsoup


    【解决方案1】:

    您可以在 for 循环中使用嵌套循环以获取 href 和代码文本以及 append 进入 list

    from bs4 import BeautifulSoup
    import requests
    page = requests.get(
        "https://www.olivemagazine.com/recipes/entertain/best-ever-starter-recipes/")
    
    soup = BeautifulSoup(page.content, 'html.parser')
    
    link=[]
    title=[]
    for i in soup.find_all('h3')[1:-3]:
        a_tag=i.find_all("a")
        
        for i in a_tag:
            link.append(i.attrs['href'])
            title.append(i.text)
    

    输出:

     link:
    
    ['https://www.olivemagazine.com/recipes/family/giant-champagne-and-lemon-prawn-vol-au-vents/',
     'https://www.olivemagazine.com/recipes/fish-and-seafood/grilled-scallops-with-nduja-butter/',
     'https://www.olivemagazine.com/recipes/quick-and-easy/herb-and-chilli-calamari/',.......]
    
    title:
    ['Giant champagne and lemon prawn vol-au-vents',
     'Grilled scallops with ’nduja butter',
     'Herb and chilli calamari',....]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-12-09
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-09-17
      • 1970-01-01
      相关资源
      最近更新 更多