【问题标题】:Max retries exceeded with url in Requests Python请求 Python 中的 url 超出了最大重试次数
【发布时间】:2021-04-03 12:44:04
【问题描述】:

我尝试向这个 URL 发送请求:Site URL

并使用以下代码进行简单的页面阅读:

import requests
url_1 = 'http://www.dsit.org.ir/?cmd=page&Cid=92&title=Kontakt&lang=fa'
print(requests.get(url_1).text)

但我得到了这个错误:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.srgfesrsergserg.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000008EC69AAA90>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

此网站非常基础,没有特殊的安全措施。另外,我只请求过一次

一切都很好,我很容易用 Request-html 打开这个页面,但我不知道这里有什么问题!

【问题讨论】:

    标签: python web-scraping python-requests request httpurlconnection


    【解决方案1】:

    添加标题。

    伪装浏览器。

    import requests
    
    headers = {
        'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'}
    
    url_1 = 'http://www.dsit.org.ir/?cmd=page&Cid=92&title=Kontakt&lang=fa'
    
    print(requests.get(url=url_1, headers=headers).text)
    

    【讨论】:

    • 哇,这太棒了,它的工作很好,兄弟,谢谢你,你能告诉它是用于什么浏览器和系统的吗?
    • F12 Network -> F5 -> Request Headers 之后点击任意一个,就有user-agent
    • 没问题。我很乐意帮助你。
    • 是的,我找到了,但我想知道这个标题是什么浏览器?
    • 每个人都不一样,我直接从浏览器获取。
    【解决方案2】:

    当您向https://www.dsit.org.ir 的公共 IP 地址发送过多请求时,会发生这种情况。正如您所看到的,由于某些原因不允许/阻止使用https://www.dsit.org.ir 映射访问公共 IP 地址。一个更好的解决方案是以下 python 脚本,它计算任何域的公共 IP 地址并创建该映射到 /etc/hosts 文件。

    import re
    import socket
    import subprocess
    from typing import Tuple
    
    ENDPOINT = 'https://anydomainname.example.com/'
    ENDPOINT = 'https://www.dsit.org.ir/'
    
    def get_public_ip() -> Tuple[str, str, str]:
        """
        Command to get public_ip address of host machine and endpoint domain
        Returns
        -------
        my_public_ip : str
            Ip address string of host machine.
        end_point_ip_address : str
            Ip address of endpoint domain host.
        end_point_domain : str
            domain name of endpoint.
    
        """
        # bash_command = """host myip.opendns.com resolver1.opendns.com | \
        #     grep "myip.opendns.com has" | awk '{print $4}'"""
        # bash_command = """curl ifconfig.co"""
        # bash_command = """curl ifconfig.me"""
        bash_command = """ curl icanhazip.com"""
        my_public_ip = subprocess.getoutput(bash_command)
        my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
        end_point_domain = (
            ENDPOINT.replace("https://", "")
            .replace("http://", "")
            .replace("/", "")
        )
        end_point_ip_address = socket.gethostbyname(end_point_domain)
        return my_public_ip, end_point_ip_address, end_point_domain
    
    
    def set_etc_host(ip_address: str, domain: str) -> str:
        """
        A function to write mapping of ip_address and domain name in /etc/hosts.
        Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
    
        Parameters
        ----------
        ip_address : str
            IP address of the domain.
        domain : str
            domain name of endpoint.
    
        Returns
        -------
        str
            Message to identify success or failure of the operation.
    
        """
        bash_command = """echo "{}    {}" >> /etc/hosts""".format(ip_address, domain)
        output = subprocess.getoutput(bash_command)
        return output
    
    
    if __name__ == "__main__":
        my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
        output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
        print("My public IP address:", my_public_ip)
        print("ENDPOINT public IP address:", end_point_ip_address)
        print("ENDPOINT Domain Name:", end_point_domain )
        print("Command output:", output)
    

    你可以在运行你想要的函数之前调用上面的脚本:)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-07-25
      • 1970-01-01
      • 2021-02-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-08-30
      • 2020-07-11
      相关资源
      最近更新 更多