Beautiful Soup - urllib.error.HTTPError: HTTP 错误 403: Forbidden答案

【问题标题】：Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: ForbiddenBeautiful Soup - urllib.error.HTTPError: HTTP 错误 403: Forbidden
【发布时间】：2021-01-24 04:25:08
【问题描述】：

我正在尝试使用urrlib 下载一个 GIF 文件，但它抛出了这个错误：

urllib.error.HTTPError: HTTP Error 403: Forbidden

当我从其他博客网站下载时，不会发生这种情况。这是我的代码：

import requests
import urllib.request

url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'})    

path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/'

full_name = path + ".gif"    
urllib.request.urlretrieve(url_1,full_name)

【问题讨论】：

标签： python beautifulsoup urllib

【解决方案1】：

不要使用urllib.request.urlretrieve。相反，请像这样使用requests 库：

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path = "D:\\Test.gif"

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})

file = open(path, "wb")

file.write(response.content)

file.close()

输出：

希望这会有所帮助！

【讨论】：

【解决方案2】：

解决方案：
远程服务器显然正在检查用户代理标头并拒绝来自 Python 的 urllib 的请求。
urllib.request.urlretrieve() 不允许您更改 HTTP 标头，但是，您可以使用
urllib.request.URLopener.retrieve():

import urllib.request

url_1='https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path='/home/piyushsambhi/Downloads/'

full_name= path + "testimg.gif"

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0')
filename, headers = opener.retrieve(url_1, full_name)

print(filename)

注意：您使用的是 Python 3，这些函数现在被视为“Legacy interface”的一部分，URLopener 已被弃用。因此，您不应在新代码中使用它们。

您的代码导入了requests，但您不使用它——您应该使用它，因为它比urllib 容易得多。下面提到的代码 sn-p 对我有用：

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg1.gif"

r = requests.get(url)
with open(full_name, 'wb') as outfile:
    outfile.write(r.content)

注意：根据您的机器和环境更改路径变量

【讨论】：