【问题标题】:How to retry the task when aiohttp.ClientSession fails in async当 aiohttp.ClientSession 异步失败时如何重试任务
【发布时间】:2020-11-27 21:55:35
【问题描述】:

由于我是 Python 中的异步函数的新手,所以我很难理解这种行为。

我正在尝试创建这个简单的下载工具并且我有这个功能

async def download_all_pages(sites):
    print('Running download all pages')
    try:
        async with aiohttp.ClientSession() as session:
            tasks = [asyncio.ensure_future(safe_download_page(session,url)) for url in sites]
            await asyncio.gather(*tasks, return_exceptions = True)
            try:
                await asyncio.sleep(0.25)
            except asyncio.CancelledError:
                print("Got CancelledError")
    except (aiohttp.ServerDisconnectedError, aiohttp.ClientResponseError,aiohttp.ClientConnectorError) as s:
        print("Oops, the server connection was dropped before we finished.")
        print(s)

我像下面这样初始化这个函数:

try:
    loop.run_until_complete(download_all_pages([url+'/'+str(i) for i in range(1, nb_pages+1)]))
    loop.run_until_complete(download_all_sites([result['href'] for result in results]))
finally:
    loop.run_until_complete(loop.shutdown_asyncgens())
    loop.close()
print('Finished at '+str(datetime.timestamp(datetime.now())))

每当我得到一个错误,在这个例子中主要是aiohttp.ServerDisconnectedError;输出显示

Oops, the server connection was dropped before we finished.
Server disconnected
Finished at 1606440807.007339
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!
Task was destroyed but it is pending!

...只有一百万 Task was destroyed but it is pending!

所以当这个错误发生时,我不希望函数完成,因为还有很多任务要完成;因此错误 Task was destroy but it is pending!.

如你所见,它在调用 loop.run_until_complete(download_all_sites([result['href']) 之前调用了 print('Finished at') strong>; 它似乎完全退出了整个脚本。(编辑:我想我发现了为什么会发生这种情况。 由于上面的 try:,因为它失败了,所以它直接进入 finally: 子句,因此破坏了挂起的任务。如何避免整个断开连接的问题仍然存在)

您知道如何安全地重试出现 aiohttp.ServerDisconnectedError 错误的任务吗?

这与不使用if __name__ == "__main__":有关吗?

【问题讨论】:

    标签: python asynchronous async-await python-asyncio aiohttp


    【解决方案1】:

    这与不使用if __name__ == "__main__":有关吗?

    这与不使用if __name__ == "__main__" 无关。它与没有在正确的地方处理异常有关。 asyncio.gather() 启动给定任务并返回其结果的元组。如果其中任何任务引发异常,gather() 会立即引发相同的异常,而无需等待其余任务完成。

    您应该在未显示的函数中处理异常,safe_download_page。在此处使用try,捕获可以从中恢复的与aiohttp 相关的异常,然后重试并重试(必要时使用循环,在迭代之间休眠)以防出错。像这样的东西(未经测试):

    async def download_all_pages(sites):
        print('Running download all pages')
        async with aiohttp.ClientSession() as session:
            tasks = [asyncio.ensure_future(safe_download_page(session,url)) for url in sites]
            await asyncio.gather(*tasks)
            try:
                await asyncio.sleep(0.25)
            except asyncio.CancelledError:
                print("Got CancelledError")
    
    async def safe_download_page(session, url):
        while True:
            try:
                async with sem:
                    await download_page(session, url)
                    break
            except (aiohttp.ServerDisconnectedError, aiohttp.ClientResponseError,aiohttp.ClientConnectorError) as s:
                print("Oops, the server connection was dropped on ", url, ": ", s)
                await asyncio.sleep(1)  # don't hammer the server
    

    【讨论】:

    • 嘿!感谢您的回答。函数是async def safe_download_page(session, url): async with sem: download_page(session, url)download_page 做了一堆与解析相关的事情,并以async with session.get(url) as response: 开头。所以你告诉我我应该从 download_all_sites 中删除收集,然后把它放在哪里?或者根本不使用收集?就像在async with sem: 之后将所有download_page 逻辑移动到safe_download_page 并捕获session.get(url) 错误?
    • gather 很好,我不是建议你删除它,我只是在解释它的作用。我建议捕获异常的try 应该在safe_download_page 中,而不是在download_all_pages 中。
    • @Raphaelmsr 另外,我现在看到您正在使用return_exceptions=True,它实际上改变了gather 的行为。但我的建议仍然适用;我现在修改了这个问题,包括一个例子来说明我的意思。
    • 甜蜜!看到您的编辑,awesome 会尝试,如果我遇到任何问题,会通知您。编辑: while true 不会导致无限循环吗?重做已经处理过的任务?我在while True 中遇到了这种行为。谢谢!
    • 编辑 2:我还看到您在调用 download_page 之前删除了 return await 并且您 break 循环;这是有原因的吗?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-05-05
    • 1970-01-01
    • 2020-03-30
    相关资源
    最近更新 更多