【发布时间】:2021-05-25 09:50:02
【问题描述】:
我使用了以下代码:
from bs4 import BeautifulSoup
import requests
page = requests.get(
"https://www.olivemagazine.com/recipes/entertain/best-ever-starter-recipes/")
soup = BeautifulSoup(page.content, 'html.parser')
for i in soup.find_all('h3')[1:-3]:
print(i)
要获得这种输出:
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/meat-and-poultry/summer-deli-board/" rel="noopener" target="_blank">Summer deli board</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/entertain/marinated-figs-with-mozzarella-and-serrano-ham/" rel="noopener" target="_blank">Marinated figs with mozzarella and serrano ham</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomato-salad-with-burrata-and-warm-nduja-dressing/">Tomato salad with burrata and warm 'nduja dressing</a></h3>
<h3 class="p1"><a href="https://www.olivemagazine.com/recipes/quick-and-easy/griddled-avocados-with-crab-and-chorizo/" rel="noopener" target="_blank">Griddled avocados with crab and chorizo</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/duck-chicken-and-sour-cherry-terrine/">Duck, chicken and sour cherry terrine</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/steak-tartare/3000.html" target="_self">Steak tartare</a></h3>
<h3><a href="http://www.olivemagazine.com/recipes/meat-and-poultry/tomatoes-and-lardo-on-toast-with-basil-oil/">Tomatoes and lardo on toast with basil oil</a></h3>
我想从这里提取锚标记中的链接以及显示名称,例如 Summer Deli board。
我不知道如何从我目前得到的地方提取这两个元素。
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup