Beautifulsoup 获得多个 href [重复]答案

【问题标题】：Beautifulsoup get multiple hrefs [duplicate]Beautifulsoup 获得多个 href [重复]
【发布时间】：2016-10-09 00:47:53
【问题描述】：

我正在尝试从 url 获取 href，放入列表并打印出列表中的一个。例如第三个，但我得到的只是每个href的第三个字符。

import urllib
from bs4 import BeautifulSoup

newlist=[]
page = urllib.urlopen("http://python-data.drchuck.net/known_by_Kamran.html").read()
soup = BeautifulSoup(page, "html.parser")
tags = soup.find_all('a')
for tag in tags:
    newlist=tag.get("href", None)
    print newlist[2]

输出是：吨吨吨吨吨吨 t...

【问题讨论】：

您正在重新分配 newlist=tag.get("href", None) 这是一个字符串或 None 不是一个列表。这是非常基础的东西，你应该考虑阅读一些教程。

标签： python beautifulsoup href

【解决方案1】：

以下正确打印所有href。

import urllib
from bs4 import BeautifulSoup

newlist=[]
page = urllib.urlopen("http://www.django-rest-framework.org/api-guide/throttling/#how-clients-are-identified").read()
soup = BeautifulSoup(page, "html.parser")
tags = soup.find_all('a', href=True)
for tag in tags:
    print tag['href']

PS：你提到的网页无法访问，所以我使用了不同的网页。

【讨论】：