【发布时间】:2009-07-16 22:27:54
【问题描述】:
我正在寻找一种从 URL(即 200、404 等)获取 HTTP 响应代码的快速方法。我不确定要使用哪个库。
【问题讨论】:
标签: python
我正在寻找一种从 URL(即 200、404 等)获取 HTTP 响应代码的快速方法。我不确定要使用哪个库。
【问题讨论】:
标签: python
使用美妙的requests library 更新。请注意,我们使用的是 HEAD 请求,它应该比完整的 GET 或 POST 请求发生得更快。
import requests
try:
r = requests.head("https://stackoverflow.com")
print(r.status_code)
# prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
print("failed to connect")
【讨论】:
requests 为您的链接提供403,尽管它仍在浏览器中工作。
这是一个使用httplib 的解决方案。
import httplib
def get_status_code(host, path="/"):
""" This function retreives the status code of a website by requesting
HEAD data from the host. This means that it only requests the headers.
If the host cannot be reached or something else goes wrong, it returns
None instead.
"""
try:
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
return conn.getresponse().status
except StandardError:
return None
print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404
【讨论】:
except 块限制为至少 StandardError,这样您就不会错误地捕获像 KeyboardInterrupt 这样的东西。
curl -I http://www.amazon.com/。
你应该使用 urllib2,像这样:
import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
try:
connection = urllib2.urlopen(url)
print connection.getcode()
connection.close()
except urllib2.HTTPError, e:
print e.getcode()
# Prints:
# 200 [from the try block]
# 404 [from the except block]
【讨论】:
http://entrian.com/ 更改为 http://entrian.com/blog,即使涉及重定向到 http://entrian.com/blog/,生成的 200 也是正确的(注意尾部斜杠)。
以后,对于那些使用 python3 和更高版本的人,这里有另一个代码来查找响应代码。
import urllib.request
def getResponseCode(url):
conn = urllib.request.urlopen(url)
return conn.getcode()
【讨论】:
urllib2.HTTPError 异常不包含getcode() 方法。请改用code 属性。
【讨论】:
解决@Niklas R 对@nickanor 回答的评论:
from urllib.error import HTTPError
import urllib.request
def getResponseCode(url):
try:
conn = urllib.request.urlopen(url)
return conn.getcode()
except HTTPError as e:
return e.code
【讨论】:
这是一个 httplib 解决方案,其行为类似于 urllib2。你可以给它一个 URL,它就可以工作。无需将您的 URL 拆分为主机名和路径。这个函数已经做到了。
import httplib
import socket
def get_link_status(url):
"""
Gets the HTTP status of the url or returns an error associated with it. Always returns a string.
"""
https=False
url=re.sub(r'(.*)#.*$',r'\1',url)
url=url.split('/',3)
if len(url) > 3:
path='/'+url[3]
else:
path='/'
if url[0] == 'http:':
port=80
elif url[0] == 'https:':
port=443
https=True
if ':' in url[2]:
host=url[2].split(':')[0]
port=url[2].split(':')[1]
else:
host=url[2]
try:
headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
'Host':host
}
if https:
conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
else:
conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
conn.request(method="HEAD",url=path,headers=headers)
response=str(conn.getresponse().status)
conn.close()
except socket.gaierror,e:
response="Socket Error (%d): %s" % (e[0],e[1])
except StandardError,e:
if hasattr(e,'getcode') and len(e.getcode()) > 0:
response=str(e.getcode())
if hasattr(e, 'message') and len(e.message) > 0:
response=str(e.message)
elif hasattr(e, 'msg') and len(e.msg) > 0:
response=str(e.msg)
elif type('') == type(e):
response=e
else:
response="Exception occurred without a good error message. Manually check the URL to see the status. If it is believed this URL is 100% good then file a issue for a potential bug."
return response
【讨论】:
依赖多个工厂,但尝试测试这些方法:
import requests
def url_code_status(url):
try:
response = requests.head(url, allow_redirects=False)
return response.status_code
except Exception as e:
print(f'[ERROR]: {e}')
或:
import http.client as httplib
import urllib.parse
def url_code_status(url):
try:
protocol, host, path, query, fragment = urllib.parse.urlsplit(url)
if protocol == "http":
conntype = httplib.HTTPConnection
elif protocol == "https":
conntype = httplib.HTTPSConnection
else:
raise ValueError("unsupported protocol: " + protocol)
conn = conntype(host)
conn.request("HEAD", path)
resp = conn.getresponse()
conn.close()
return resp.status
except Exception as e:
print(f'[ERROR]: {e}')
100 个网址的基准测试结果:
【讨论】: