【发布时间】:2016-01-22 23:38:03
【问题描述】:
我正在尝试下载 PDF,但出现以下错误:HTTP 错误 403:禁止
我知道服务器由于某种原因而阻塞,但我似乎找不到解决方案。
import urllib.request
import urllib.parse
import requests
def download_pdf(url):
full_name = "Test.pdf"
urllib.request.urlretrieve(url, full_name)
try:
url = ('http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf')
print('initialized')
hdr = {}
hdr = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36',
'Content-Length': '136963',
}
print('HDR recieved')
req = urllib.request.Request(url, headers=hdr)
print('Header sent')
resp = urllib.request.urlopen(req)
print('Request sent')
respData = resp.read()
download_pdf(url)
print('Complete')
except Exception as e:
print(str(e))
【问题讨论】:
-
如果服务器阻塞,可能不是一个简单的方法。禁止意味着你不被允许。
标签: python http python-requests urllib