【问题标题】:Handeling errors in Python requests [duplicate]处理 Python 请求中的错误 [重复]
【发布时间】:2021-12-30 12:31:01
【问题描述】:

我正在学习在 Python 中使用 requests,如果站点确实根本不存在,我需要一种方法来获得有意义的输出。

我查看了this 问题,但不清楚问题的 OP 是否真的要检查该站点是否存在,或者它是否只是返回错误。该问题的所有答案的问题在于,如果站点根本不存在,我们就不能真正使用 HTTP 响应标头,因为不存在的服务器不会返回任何响应。

这是一个例子。

如果我使用此代码,我不会收到任何错误,因为该站点存在。

import requests
r = requests.get('https://duckduckgo.com')

但是,如果我输入一个我知道不存在的网页,我会得到一个错误

import requests
r = requests.get('https://thissitedoesnotexist.com')

if r.status_code == requests.codes.ok:
    print('Site good')
else:
    print('Site bad')

这个错误非常长,如果站点不存在,我希望有一个更有意义和更短的错误。

Traceback (most recent call last):
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py", line 416, in connect
    self.sock = ssl_wrap_socket(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 512, in wrap_socket
    return self.sslsocket_class._create(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1070, in _create
    self.do_handshake()
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1341, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 439, in send
    resp = conn.urlopen(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\ADMIN\Desktop\tetst.py", line 2, in <module>
    r = requests.get('https://234876.com')
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))

是否可以创建一个返回的函数,例如 print('The site probably does not exist') 或至少不给出 EOF 错误?

【问题讨论】:

  • 你应该使用 Python try - except 块,因为你得到一个异常。
  • 您问题中的回溯对应于一个确实存在但没有有效 SSL 证书的站点。对于任何重要的事情,您可能想要开发一个小型测试套件来处理不同的故障场景。

标签: python http web screen-scraping


【解决方案1】:

通常理想的做法是从请求中捕获异常

您还可以在Response 上使用.raise_for_status() 为非OK 请求获取有意义的Exception

但是,您要注意在哪里要处理异常

  • 立即?您的程序可以有意义地处理它还是应该退出?
  • 调用者应该处理特定的异常(例如requests.exceptions.Timeout)还是更一般的异常?
  • 你有很多相互调用的函数吗?是否应该处理一些可能的Exceptions 子集?哪个?

参见Python Exception Hierarchy 了解第一方 Exceptions 的继承结构如何

import sys
import requests

def some_function_which_makes_requests():
    r = requests.get("https://example.com", timeout=(2,10))
    r.raise_for_status()  # raise for non-OK
    return r.json()       # interpret response via some method (for example as JSON)

def main():
    ...
    try:
        result_json = some_function_which_makes_requests
    except requests.exceptions.Timeout:
        print("WARNING: request timed out")
        result_json = None  # still effectively handled for later program?
    except requests.exceptions.RequestException as ex:
        sys.exit(f"something wrong with Request: {repr(ex)}")
    except Exception:
        sys.exit(f"something wrong around Request: {repr(ex)}")
    # now you can use result_json

【讨论】:

    【解决方案2】:

    做了一些更多的研究,才知道我需要使用@Anand Sowmithiran 提到的 Python Try except。这是一个为初学者解释它的视频:https://www.youtube.com/watch?v=NIWwJbo-9_8

    import requests
    
    try:
        r = requests.get("http://www.duckduckgo.com")
    except requests.exceptions.ConnectionError:
        print('\n\tSorry. There was a network problem getting the URL. Perhaps it does not exist?\n\tCheck the URL, DNS issues or if you are being rejected by the server.')
    else:
        print(r)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-10-05
      • 1970-01-01
      • 1970-01-01
      • 2014-09-10
      • 2013-10-02
      • 1970-01-01
      相关资源
      最近更新 更多