通过 Python 检查网站是否已启动答案

【问题标题】：Checking if a website is up via Python通过 Python 检查网站是否已启动
【发布时间】：2010-12-29 06:58:50
【问题描述】：

通过使用python，我如何检查网站是否已启动？根据我阅读的内容，我需要检查“HTTP HEAD”并查看状态码“200 OK”，但是该怎么做呢？

干杯

相关

How do you send a HEAD HTTP request in Python?

【问题讨论】：

复制：stackoverflow.com/questions/107405/…

标签： python http scripting httprequest http-head

【解决方案1】：

如果通过 up，您的意思只是“服务器正在服务”，那么您可以使用 cURL，如果您得到响应，那么它就是 up。

我不能给你具体的建议，因为我不是 python 程序员，但是这里有一个指向 pycurl http://pycurl.sourceforge.net/ 的链接。

【讨论】：

【解决方案2】：

标准库中httplib 模块中的HTTPConnection 对象可能会为您解决问题。顺便说一句，如果您开始在 Python 中使用 HTTP 进行任何高级操作，请务必查看 httplib2；这是一个很棒的图书馆。

【讨论】：

【解决方案3】：

您可以尝试使用来自urllib 的getcode() 执行此操作

import urllib.request

print(urllib.request.urlopen("https://www.stackoverflow.com").getcode())

对于 Python 2，使用

print urllib.urlopen("http://www.stackoverflow.com").getcode()

【讨论】：

以下问题，使用urlopen.getcode 是否获取整个页面？
据我所知，getcode 从发回的响应中检索状态
@Oscar，urllib 中没有任何内容表明它使用 HEAD 而不是 GET，但上面 Daniel 引用的重复问题显示了如何执行前者。
python 3.x 中似乎没有方法 urlopen 了。我一直得到的是 ImportError: cannot import name 'urlopen' 我该如何解决这个问题？
@l1zard 像这样：req = urllib.request.Request(url, headers = headers) resp = urllib.request.urlopen(req)

【解决方案4】：

您可以使用httplib

import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD", "/")
r1 = conn.getresponse()
print r1.status, r1.reason

打印

200 OK

当然，前提是www.python.org 已启动。

【讨论】：

这仅检查域，需要像这样有效的网页。

【解决方案5】：

import httplib
import socket
import re

def is_website_online(host):
    """ This function checks to see if a host name has a DNS entry by checking
        for socket info. If the website gets something in return, 
        we know it's available to DNS.
    """
    try:
        socket.gethostbyname(host)
    except socket.gaierror:
        return False
    else:
        return True


def is_page_available(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        False.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        if re.match("^[23]\d\d$", str(conn.getresponse().status)):
            return True
    except StandardError:
        return None

【讨论】：

is_website_online 只是告诉您主机名是否有 DNS 条目，而不是网站是否在线。

【解决方案6】：

我认为最简单的方法是使用Requests 模块。

import requests

def url_ok(url):
    r = requests.head(url)
    return r.status_code == 200

【讨论】：

这不适用于url = "http://foo.example.org/" 我预计会出现 404，但会崩溃。
这将返回 False 以获取除 200 (OK) 之外的任何其他响应代码。所以你不会知道它是否是 404。它只检查网站是否已启动并且可供公众使用。
@caisah，你测试了吗？乔纳斯是对的；我得到一个例外；引发 ConnectionError(e) requests.exceptions.ConnectionError: HTTPConnectionPool(host='nosuch.org2', port=80): Max retries exceeded with url: / (Caused by : [Errno 8] nodename也没有提供 servname，或者不知道）
我在发布之前已经对其进行了测试。问题是，这会检查站点是否已启动并且在主机名无效或出现其他问题时不处理这种情况。您应该考虑这些异常并捕获它们。
在我看来，这并不能测试网站是否启动，因为它会崩溃（正如之前的评论者所说）。这是我对一个简短的 Pythonic 实现的尝试：stackoverflow.com/a/57999194/5712053

【解决方案7】：

from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("http://stackoverflow.com")
try:
    response = urlopen(req)
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    print ('Website is working fine')

适用于 Python 3

【讨论】：

【解决方案8】：

这是我使用 PycURL 和 validators 的解决方案

import pycurl, validators


def url_exists(url):
    """
    Check if the given URL really exists
    :param url: str
    :return: bool
    """
    if validators.url(url):
        c = pycurl.Curl()
        c.setopt(pycurl.NOBODY, True)
        c.setopt(pycurl.FOLLOWLOCATION, False)
        c.setopt(pycurl.CONNECTTIMEOUT, 10)
        c.setopt(pycurl.TIMEOUT, 10)
        c.setopt(pycurl.COOKIEFILE, '')
        c.setopt(pycurl.URL, url)
        try:
            c.perform()
            response_code = c.getinfo(pycurl.RESPONSE_CODE)
            c.close()
            return True if response_code < 400 else False
        except pycurl.error as err:
            errno, errstr = err
            raise OSError('An error occurred: {}'.format(errstr))
    else:
        raise ValueError('"{}" is not a valid url'.format(url))

【讨论】：

【解决方案9】：

您好，该课程可以使用该课程对您的网页进行速度和升级测试：

 from urllib.request import urlopen
 from socket import socket
 import time


 def tcp_test(server_info):
     cpos = server_info.find(':')
     try:
         sock = socket()
         sock.connect((server_info[:cpos], int(server_info[cpos+1:])))
         sock.close
         return True
     except Exception as e:
         return False


 def http_test(server_info):
     try:
         # TODO : we can use this data after to find sub urls up or down    results
         startTime = time.time()
         data = urlopen(server_info).read()
         endTime = time.time()
         speed = endTime - startTime
         return {'status' : 'up', 'speed' : str(speed)}
     except Exception as e:
         return {'status' : 'down', 'speed' : str(-1)}


 def server_test(test_type, server_info):
     if test_type.lower() == 'tcp':
         return tcp_test(server_info)
     elif test_type.lower() == 'http':
         return http_test(server_info)

【讨论】：

【解决方案10】：

如果服务器关闭，在 python 2.7 x86 windows 上，urllib 没有超时并且程序进入死锁。所以使用 urllib2

import urllib2
import socket

def check_url( url, timeout=5 ):
    try:
        return urllib2.urlopen(url,timeout=timeout).getcode() == 200
    except urllib2.URLError as e:
        return False
    except socket.timeout as e:
        print False


print check_url("http://google.fr")  #True 
print check_url("http://notexist.kc") #False

【讨论】：

【解决方案11】：

您可以使用requests 库来查找网站是否已启动，即status code 为200

import requests
url = "https://www.google.com"
page = requests.get(url)
print (page.status_code) 

>> 200

【讨论】：

【解决方案12】：

Requests 和 httplib2 是不错的选择：

# Using requests.
import requests
request = requests.get(value)
if request.status_code == 200:
    return True
return False

# Using httplib2.
import httplib2

try:
    http = httplib2.Http()
    response = http.request(value, 'HEAD')

    if int(response[0]['status']) == 200:
        return True
except:
    pass
return False

如果使用Ansible，可以使用fetch_url函数：

from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.urls import fetch_url

module = AnsibleModule(
    dict(),
    supports_check_mode=True)

try:
    response, info = fetch_url(module, url)
    if info['status'] == 200:
        return True

except Exception:
    pass

return False

【讨论】：

【解决方案13】：

在我看来，caisah's answer 忽略了您问题的一个重要部分，即处理服务器离线。

尽管如此，使用requests 是我最喜欢的选项，尽管如此：

import requests

try:
    requests.get(url)
except requests.exceptions.ConnectionError:
    print(f"URL {url} not reachable")

【讨论】：

【解决方案14】：

我的 2 美分

def getResponseCode(url):
conn = urllib.request.urlopen(url)
return conn.getcode()

if getResponseCode(url) != 200:
    print('Wrong URL')
else:
    print('Good URL')

【讨论】：

【解决方案15】：

我为此使用 requests，这样它就简单明了。代替 print 函数，您可以定义和调用新函数（通过电子邮件等通知）。 Try-except 块是必不可少的，因为如果主机无法访问，那么它会引发很多异常，因此您需要将它们全部捕获。

import requests

URL = "https://api.github.com"

try:
    response = requests.head(URL)
except Exception as e:
    print(f"NOT OK: {str(e)}")
else:
    if response.status_code == 200:
        print("OK")
    else:
        print(f"NOT OK: HTTP response code {response.status_code}")

【讨论】：

【解决方案16】：

您也可以通过这种方式查看网站状态，

Import requests
def monitor():
    r = requests.get("https://www.google.com/", timeout=5)
    print(r.status_code)

【讨论】：