如何使用 beautifulsoup 从链接中获取文本和 URL答案

【问题标题】：How to get the text and URL from a link using beautifulsoup如何使用 beautifulsoup 从链接中获取文本和 URL
【发布时间】：2020-05-19 13:44:11
【问题描述】：

我有以下代码，它打印出表格中每个团队的链接列表：

import requests
from bs4 import BeautifulSoup

# Get all teams in Big Sky standings table
URL = 'https://www.espn.com/college-football/standings/_/group/20/view/fcs-i-aa'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
standings = soup.find_all('table', 'Table Table--align-right Table--fixed Table--fixed-left')

for team in standings:
    team_data = team.find_all('span', 'hide-mobile')
    print(team_data)

代码打印出整个列表，如果我确定一个索引，例如“print(team_data[0])”，它将打印出页面中的特定链接。

然后我怎样才能进入该链接并从 URL 中获取字符串以及链接的文本？

例如，我的代码为列表中的第一个索引打印出以下内容。

<span class="hide-mobile"><a class="AnchorLink" data-clubhouse-uid="s:20~l:23~t:2692" href="/college-football/team/_/id/2692/weber-state-wildcats" tabindex="0">Weber State Wildcats</a></span>

怎么拉

/college-football/team/_/id/2692/weber-state-wildcats

和

韦伯州立野猫队

来自链接？

感谢您抽出宝贵时间，如果有什么我可以补充说明的，请随时提问。

【问题讨论】：

标签： python beautifulsoup

【解决方案1】：

前提是你有htmllike：

<span class="hide-mobile"><a class="AnchorLink" data-clubhouse-uid="s:20~l:23~t:2692" href="/college-football/team/_/id/2692/weber-state-wildcats" tabindex="0">Weber State Wildcats</a></span>

获取/college-football/team/_/id/2692/weber-state-wildcats：

>>> team_data.find_all('a')[0]['href']
'/college-football/team/_/id/2692/weber-state-wildcats'

获取Weber State Wildcats：

>>> team_data.find_all('a')[0].text
'Weber State Wildcats''

【讨论】：

【解决方案2】：

就 href/url 而言，您可以执行this 之类的操作。

关于链接文本，您可以执行this 之类的操作。

两者都相当于过滤到目标元素，然后提取所需的属性。

【讨论】：