BeautifulSoup 获得 href [重复]答案

【问题标题】：BeautifulSoup getting href [duplicate]BeautifulSoup 获得 href [重复]
【发布时间】：2011-08-14 12:25:18
【问题描述】：

我有以下soup：

<a href="some_url">next</a>
<span class="class">...</span>

我想从中提取href，"some_url"

如果我只有一个标签，我可以做到，但这里有两个标签。我也可以得到文字'next'，但这不是我想要的。

另外，在某处是否有关于 API 的良好描述以及示例。我正在使用the standard documentation，但我正在寻找更有条理的东西。

【问题讨论】：

请发布代码示例以展示您的尝试
好吧，我想通了：soup.find('a')['href'] 让我感到困惑的是我使用 django (html) 来查看它，它实际上删除了href 呈现之前：soup.find('a') 仅变为 'next'
没错，这个问题是重复的。然而，即使在几年后，@MarkLongair 的回答的美妙之处也让它变得珍贵。

标签： python tags beautifulsoup

【解决方案1】：

您可以通过以下方式使用find_all 查找每个具有href 属性的a 元素，并打印每个元素：

from BeautifulSoup import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']

输出将是：

Found the URL: some_url
Found the URL: another_url

请注意，如果您使用的是旧版本的 BeautifulSoup（版本 4 之前），则此方法的名称为 findAll。在版本 4 中，BeautifulSoup 的方法名称为 were changed to be PEP 8 compliant，因此您应该改用 find_all。

如果您希望所有标签带有href，您可以省略name参数：

href_tags = soup.find_all(href=True)

【讨论】：

你能得到类“class="class"”的单个href吗
@yoshiserry soup.find('a', {'class': 'class'})['href']
如何减少误报和不需要的结果（即javascript:void(0)、/en/support/index.html、#smp-navigationList）？
您好，我怎样才能获得 href 中的“下一个”值。 <a href="some_url">NEXT</a>
@abdoulsn soup.find('a').contents[0]