使用 bs4 提取标题标签中的链接和标题答案

【问题标题】：Extract Link and Title Within a Heading Tag with bs4使用 bs4 提取标题标签中的链接和标题
【发布时间】：2021-05-25 09:50:02
【问题描述】：

我使用了以下代码：

from bs4 import BeautifulSoup
import requests
page = requests.get(
    "https://www.olivemagazine.com/recipes/entertain/best-ever-starter-recipes/")

soup = BeautifulSoup(page.content, 'html.parser')


for i in soup.find_all('h3')[1:-3]:
    print(i)

要获得这种输出：

<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/meat-and-poultry/summer-deli-board/" rel="noopener" target="_blank">Summer deli board</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/entertain/marinated-figs-with-mozzarella-and-serrano-ham/" rel="noopener" target="_blank">Marinated figs with mozzarella and serrano ham</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomato-salad-with-burrata-and-warm-nduja-dressing/">Tomato salad with burrata and warm 'nduja dressing</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/quick-and-easy/griddled-avocados-with-crab-and-chorizo/" rel="noopener" target="_blank">Griddled avocados with crab and chorizo</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/duck-chicken-and-sour-cherry-terrine/">Duck, chicken and sour cherry terrine</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/steak-tartare/3000.html" target="_self">Steak tartare</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomatoes-and-lardo-on-toast-with-basil-oil/">Tomatoes and lardo on toast with basil oil</a></h3>

我想从这里提取锚标记中的链接以及显示名称，例如 Summer Deli board。

我不知道如何从我目前得到的地方提取这两个元素。

【问题讨论】：

标签： python python-3.x web-scraping beautifulsoup

【解决方案1】：

您可以在 for 循环中使用嵌套循环以获取 href 和代码文本以及 append 进入 list

from bs4 import BeautifulSoup
import requests
page = requests.get(
    "https://www.olivemagazine.com/recipes/entertain/best-ever-starter-recipes/")

soup = BeautifulSoup(page.content, 'html.parser')

link=[]
title=[]
for i in soup.find_all('h3')[1:-3]:
    a_tag=i.find_all("a")
    
    for i in a_tag:
        link.append(i.attrs['href'])
        title.append(i.text)

输出：

 link:

['https://www.olivemagazine.com/recipes/family/giant-champagne-and-lemon-prawn-vol-au-vents/',
 'https://www.olivemagazine.com/recipes/fish-and-seafood/grilled-scallops-with-nduja-butter/',
 'https://www.olivemagazine.com/recipes/quick-and-easy/herb-and-chilli-calamari/',.......]

title:
['Giant champagne and lemon prawn vol-au-vents',
 'Grilled scallops with ’nduja butter',
 'Herb and chilli calamari',....]

【讨论】：