【问题标题】:Python3: urllib.error.HTTPError: HTTP Error 403: ForbiddenPython3:urllib.error.HTTPError:HTTP 错误 403:禁止
【发布时间】:2014-05-17 15:24:25
【问题描述】:

请帮帮我!

我正在使用 Python3.3 和这段代码:

import urllib.request
import sys
Open_Page = urllib.request.urlopen(
        "http://wowcircle.com"
    ).read().decode().encode('utf-8')

我接受这个:

    Traceback (most recent call last):
  File "C:\Users\1\Desktop\WCLauncer\reg.py", line 5, in <module>
    "http://forum.wowcircle.com"
  File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 513, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

我了解,我无法访问 wowcircle.com 网站。但我只想获取源代码!我相信我可以做到,无需访问,但如何?

【问题讨论】:

    标签: python-3.x urllib


    【解决方案1】:

    我建议您相应地设置标题。看看您的浏览器发送了什么(HTTP 标头插件)。

    函数可能如下所示:

    def openAsOpera(url):
        u = urllib.URLopener() # Python 3: urllib.request.URLOpener
        u.addheaders = []
        u.addheader('User-Agent', 'Opera/9.80 (Windows NT 6.1; WOW64; U; de) Presto/2.10.289 Version/12.01')
        u.addheader('Accept-Language', 'de-DE,de;q=0.9,en;q=0.8')
        u.addheader('Accept', 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1')
        f = u.open(url)
        content = f.read()
        f.close()
        return content
    

    这可以帮助您解决一些网页上的一些错误,这些网页对客户端的期望比基本版本更多。

    现在我收到此错误:

    Traceback (most recent call last):
      File "<pyshell#0>", line 1, in <module>
        s = openAsOpera('http://wowcircle.com/')
      File "C:....pyw", line 522, in openAsOpera
        f = u.open(url)
      File "C:\Python27\lib\urllib.py", line 208, in open
        return getattr(self, name)(url)
      File "C:\Python27\lib\urllib.py", line 359, in open_http
        return self.http_error(url, fp, errcode, errmsg, headers)
      File "C:\Python27\lib\urllib.py", line 376, in http_error
        return self.http_error_default(url, fp, errcode, errmsg, headers)
      File "C:\Python27\lib\urllib.py", line 381, in http_error_default
        raise IOError, ('http error', errcode, errmsg, headers)
    IOError: ('http error', 302, 'Moved Temporarily', <httplib.HTTPMessage instance at 0x02C8F1C0>)
    

    这意味着您现在可以访问,因为您伪造了真实浏览器的请求。

    >>> try: s = openAsOpera('http://wowcircle.com/?pmtry=1')
    except: import sys; ty, err, tb = sys.exc_info()
    
    >>> err.args[3].headers
    ['Server: nginx\r\n', 'Date: Sat, 05 Apr 2014 07:42:00 GMT\r\n', 'Content-Type: text/html\r\n', 'Content-Length: 154\r\n', 'Connection: close\r\n', 'Set-Cookie: PMBC=9979187990a58a5bfdaa6d1380ad6156; path=/\r\n', 'Location: http://wowcircle.com/?pmtry=1\r\n']
    

    你会注意到:重定向到这个位置:http://wowcircle.com/?pmtry=1,然后到 whis:http://wowcircle.com/?pmtry=2。算了。并且似乎在等待 cookie。

    所以我的分析结果是:每次访问网站时不要忘记发送 cookie

    【讨论】:

    • 我有一个 AttributeError 在:[u = urllib.URLopener()] 我导入 urllib,但它并没有帮助我再次请)
    • 我发布了 Python 2 代码。我认为您在 Python 3 中将需要 urllib.request.URLOpener
    猜你喜欢
    • 1970-01-01
    • 2017-05-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-15
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多