【问题标题】:What are these errors and how do I handle them?这些错误是什么,我该如何处理?
【发布时间】:2009-11-30 04:16:14
【问题描述】:

我正在使用这个简单的代码

for l in bios:
    OpenThisLink = url + l
    response = urllib2.urlopen(OpenThisLink)

打开大约 200 个 url 并使用正则表达式(和 BeautifulSoup)搜索它们,但经过十几个左右我得到这些错误并且 IDLE 退出。他们的意思是什么?我该如何处理它们?

谢谢。

Traceback (most recent call last):

  File "\PROJECTS\JD\jd10.py", line 15, in <module> response = urllib2.urlopen(OpenThisLink)

  File "C:\Python26\lib\urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout)

  File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)

  File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)

  File "C:\Python26\lib\urllib2.py", line 421, in error result = self._call_chain(*args)

  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)

  File "C:\Python26\lib\urllib2.py", line 597, in http_error_302 return self.parent.open(new)

  File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)

  File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)

  File "C:\Python26\lib\urllib2.py", line 421, in error result = self._call_chain(*args)

  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)

  File "C:\Python26\lib\urllib2.py", line 597, in http_error_302 return self.parent.open(new)

  File "C:\Python26\lib\urllib2.py", line 389, in open response = meth(req, response)

  File "C:\Python26\lib\urllib2.py", line 502, in http_response 'http', request, response, code, msg, hdrs)

  File "C:\Python26\lib\urllib2.py", line 427, in error return self._call_chain(*args)

  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain result = func(*args)

  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 404: Not Found

【问题讨论】:

    标签: python beautifulsoup


    【解决方案1】:

    引发的错误是 HTTPError - 具体来说,您的其中一个 URL 会引发 404。你可以忽略它:

    for l in bios:
        OpenThisLink = url + l
        try:
            response = urllib2.urlopen(OpenThisLink)
        except urllib2.HTTPError:
            pass
    

    或者,您可以使用(略微)更有意义的消息重新引发错误:

    for l in bios:
        OpenThisLink = url + l
        try:
            response = urllib2.urlopen(OpenThisLink)
        except urllib2.HTTPError as e:
            raise Exception('Error opening %s: %s' % (e.geturl(), e))
    

    【讨论】:

      【解决方案2】:

      我对您正在使用的特定库一无所知。但是,在我看来,这就像一个大的堆栈跟踪,最终导致了这个原始错误:

      HTTPError:HTTP 错误 404:未找到

      我认为其中一个链接不正确,触发了未捕获的异常。

      编辑:“坏”是指服务器无法检索页面,因此出现 404 错误。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-11-19
        • 1970-01-01
        • 2020-04-25
        相关资源
        最近更新 更多