【发布时间】:2021-12-30 12:31:01
【问题描述】:
我正在学习在 Python 中使用 requests,如果站点确实根本不存在,我需要一种方法来获得有意义的输出。
我查看了this 问题,但不清楚问题的 OP 是否真的要检查该站点是否存在,或者它是否只是返回错误。该问题的所有答案的问题在于,如果站点根本不存在,我们就不能真正使用 HTTP 响应标头,因为不存在的服务器不会返回任何响应。
这是一个例子。
如果我使用此代码,我不会收到任何错误,因为该站点存在。
import requests
r = requests.get('https://duckduckgo.com')
但是,如果我输入一个我知道不存在的网页,我会得到一个错误
import requests
r = requests.get('https://thissitedoesnotexist.com')
if r.status_code == requests.codes.ok:
print('Site good')
else:
print('Site bad')
这个错误非常长,如果站点不存在,我希望有一个更有意义和更短的错误。
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py", line 416, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 512, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1070, in _create
self.do_handshake()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1341, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:997)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\Desktop\tetst.py", line 2, in <module>
r = requests.get('https://234876.com')
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
是否可以创建一个返回的函数,例如 print('The site probably does not exist') 或至少不给出 EOF 错误?
【问题讨论】:
-
你应该使用 Python try - except 块,因为你得到一个异常。
-
您问题中的回溯对应于一个确实存在但没有有效 SSL 证书的站点。对于任何重要的事情,您可能想要开发一个小型测试套件来处理不同的故障场景。
标签: python http web screen-scraping