【问题标题】:Python Requests module - Access multiple URL's at oncePython 请求模块 - 一次访问多个 URL
【发布时间】:2020-11-13 09:12:10
【问题描述】:

我需要使用大约 500 多个 URL 来验证服务及其依赖关系,并且我已经有一个 python 代码可以做到这一点。 但问题是一些 URL 需要一分钟才能给出响应 [由于一些已知的依赖关系已关闭]。 由于每个 URL 都托管在不同的服务器上,有没有办法使用 requests 模块一次访问多个 URL?

以下是我在 pycharm 中使用的全部代码:

import requests
import json
import pandas
import datetime
from requests.auth import HTTPBasicAuth
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


def validate():

    line = "---------------------------------------------------------------------------------------------------"
    print("Validation started for:", datetime.datetime.now().strftime("%d-%B-%Y at %H:%M:%S"), "\n" + line)
    username = 'username'
    password = 'password'
    mydata = pandas.read_excel(r'C:\mydata.xlsx', sheet_name='Sheet1')

    for i in mydata.index:
        srno = str(mydata['Sr No'][i])
        service = mydata['Service Name'][i]
        machine = mydata['Machine Name'][i]
        url = mydata['Node'][i]

        alwaysdownservice = ['service1', 'service2']
        paydown = ['dependency1', 'dependency2', 'dependency3']
        otherdown = ['dependency3']

        def get():
            response = requests.get(url, verify=False, auth=HTTPBasicAuth(username, password))
            data = json.loads(response.text)
            status = data['Success']
            if not status:
                response = requests.get(url, verify=False, auth=HTTPBasicAuth(username, password))
                data = json.loads(response.text)
                status = data['Success']
                if not status:
                    for j in list(data['Dependencies']):
                        dependency = j['DependencyName']
                        d_status = j['Success']
                        if not d_status:
                            if service in alwaysdownservice:
                                if dependency not in paydown:
                                    print(Dependency, "down on", machine, "for", service.)
                            else:
                                if dependency not in otherdown:
                                    print(Dependency, "down on", machine, "for", service.)
                                    
        try:
            get()
            
        except Exception as e:
                        print(line, "\n", e, "\n", srno, "| Below URL is not accessible: \n", url, "\n" + line)



validate()

【问题讨论】:

    标签: python python-3.x python-requests python-multiprocessing python-multithreading


    【解决方案1】:

    您可以使用线程(使用 Python 中的 threading 库)一次调用多个 URL。为此,您可以使用以下代码:

    import requests
    import json
    import pandas
    import datetime
    from requests.auth import HTTPBasicAuth
    from requests.packages.urllib3.exceptions import InsecureRequestWarning
    import threading
    requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
    
    threads = []
    
    def validate():
    
        line = "---------------------------------------------------------------------------------------------------"
        print("Validation started for:", datetime.datetime.now().strftime("%d-%B-%Y at %H:%M:%S"), "\n" + line)
        username = 'username'
        password = 'password'
        mydata = pandas.read_excel(r'C:\mydata.xlsx', sheet_name='Sheet1')
    
        for i in mydata.index:
            srno = str(mydata['Sr No'][i])
            service = mydata['Service Name'][i]
            machine = mydata['Machine Name'][i]
            url = mydata['Node'][i]
    
            alwaysdownservice = ['service1', 'service2']
            paydown = ['dependency1', 'dependency2', 'dependency3']
            otherdown = ['dependency3']
    
            def get():
                response = requests.get(url, verify=False, auth=HTTPBasicAuth(username, password))
                data = json.loads(response.text)
                status = data['Success']
                if not status:
                    response = requests.get(url, verify=False, auth=HTTPBasicAuth(username, password))
                    data = json.loads(response.text)
                    status = data['Success']
                    if not status:
                        for j in list(data['Dependencies']):
                            dependency = j['DependencyName']
                            d_status = j['Success']
                            if not d_status:
                                if service in alwaysdownservice:
                                    if dependency not in paydown:
                                        print(Dependency, "down on", machine, "for", service)
                                else:
                                    if dependency not in otherdown:
                                        print(Dependency, "down on", machine, "for", service)
                                        
            try:
                t = threading.Thread(target=get) # Using threading over here
                t.start()
                threads.append(t)
                
                
            except Exception as e:
                            print(line, "\n", e, "\n", srno, "| Below URL is not accessible: \n", url, "\n" + line)
    
    
    
    
    validate()
    
    for thread in threads:
        thread.join()
    

    【讨论】:

    • 我需要在某处或其他地方指定线程数吗?我运行了上面的代码,它仍然一次访问 1 个 url。
    • 是的,我确定.. 我刚刚运行它,它仍然一次访问 1 个 url。需要明确的是,您所做的唯一更改是导入线程和更改代码的“尝试”部分。
    • @messi.torres28 我意识到我在代码中的错误。我已经修好了
    • 还是不行。我尝试只访问每个花费超过 30 秒的 URL,并且访问每个 URL 所花费的时间仍然相同。请参阅下面的输出 - ``` 验证开始于:2020 年 11 月 13 日 15:42:41 [21] Service1 Machine 验证于:15:43:25 | [22] Service1 Machine 验证时间:15:44:10 ```
    • 可能是因为您请求的 URL 响应缓慢
    【解决方案2】:

    适用于需要解决方案的人。我从@Yurii Kramarenko 找到了this。效果很好,现在我的脚本在 30 秒内完成运行,而不是 10-11 分钟。

    我的脚本 -

    def validate():
        alwaysdownservice = ['service1', 'service2']
        paydown = ['dependency1', 'dependency2', 'dependency3']
        otherdown = ['dependency3']
        username = 'username'
        password = 'password'
        mydata = pandas.read_excel(r'C:\mydata.xlsx', sheet_name='Sheet1')
        urls = mydata['urls']
        line = "---------------------------------------------------------------------------------------------------"
        print("Validation started for:", datetime.datetime.now().strftime("%d-%B-%Y at %H:%M:%S"), "\n" + line)
    
        async def fetch(session, url):
            async with session.get(url, auth=aiohttp.BasicAuth(username, password), ssl=False) as response:
    
                data = await response.text()
                data = json.loads(data)
                status = data['Success']
    
                if not status:
                    for j in list(data['Dependencies']):
                        dependency = j['DependencyName']
                        d_status = j['Success']
                        if not d_status:
                            if service in alwaysdownservice:
                                if dependency not in paydown:
                                    print("Dependency -",
                                          "\'" + dependency + "\'", "down on", "\nURL -", url, "\n" + line)
                            else:
                                if dependency not in otherdown:
                                    ("Dependency -",
                                          "\'" + dependency + "\'", "down on", "\nURL -", url, "\n" + line)
    
                print(url, "validated at:", datetime.datetime.now().strftime("%H:%M:%S"))
    
        async def fetch_all(urls, loop):
            async with aiohttp.ClientSession(loop=loop) as session:
                results = await asyncio.gather(*[fetch(session, url) for url in urls], return_exceptions=True)
    
        if __name__ == '__main__':
            loop = asyncio.get_event_loop()
            htmls = loop.run_until_complete(fetch_all(urls, loop))
    
        print("Validation completed for:",
              datetime.datetime.now().strftime("%d-%B-%Y at %H:%M:%S"), "\n" + line, "\n" + line,)
    
    
    validate()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-11-12
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多