【问题标题】:Random Timeout Errors using heroku + gunicorn + aiohttp使用 heroku + gunicorn + aiohttp 的随机超时错误
【发布时间】:2017-02-10 17:29:23
【问题描述】:

我开发了一个服务,作为网关将请求重定向到不同的微服务。为此,我使用 aiohttp 处理重定向请求,使用 gunicorn (w/aiohttp.worker.GunicornWebWorker) 提供服务,并使用 Heroku 作为主机。

在本地工作一切正常,100% 的请求返回响应,客户端总是收到所需的信息,但是当我部署到 Heroku 并重定向一些请求(每分钟 5k)时,我看到 3 到 7 个 HTTP 状态为 503 的请求超时错误。没什么好担心的,因为好的解决请求的比例非常高(99.9994),但我想知道发生了什么。在超时之前引发的异常是这样的:

[2017-02-10 17:03:48 +0000] [683] [INFO] Worker exiting (pid: 683) 
ERROR:asyncio:Task was destroyed but it is pending! 
[2017-02-10 17:03:48 +0000] [683] [INFO] Stopping server: 683, connections: 1 
Exception ignored in: <generator object GunicornWebWorker._run at 0x7f18b1d2f518> 
Traceback (most recent call last): 
  yield from self.close() 
  yield from self.wsgi.shutdown() 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/web.py", line 199, in shutdown 
  yield from self.on_shutdown.send(self) 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/signals.py", line 48, in send 
  yield from self._send(*args, **kwargs) 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/signals.py", line 16, in _send 
  yield from res 
File "/app/app/app.py", line 14, in close_redis 
  app.redis_pool.close() 
File "/app/.heroku/python/lib/python3.5/site-packages/aioredis/pool.py", line 135, in close 
  self._close_state.set() 
File "/app/.heroku/python/lib/python3.5/asyncio/locks.py", line 242, in set 
  fut.set_result(True) 
File "/app/.heroku/python/lib/python3.5/asyncio/futures.py", line 332, in set_result 
  self._schedule_callbacks() 
File "/app/.heroku/python/lib/python3.5/asyncio/futures.py", line 242, in _schedule_callbacks 
  self._loop.call_soon(callback, self) 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 497, in call_soon 
  handle = self._call_soon(callback, args) 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 506, in _call_soon 
  self._check_closed() 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 334, in _check_closed 
  raise RuntimeError('Event loop is closed') 
RuntimeError: Event loop is closed 
ERROR:asyncio:Task was destroyed but it is pending! 
task: <Task pending coro=<ServerHttpProtocol.start() running at /app/.heroku/python/lib/python3.5/site-packages/aiohttp/server.py:261>>
[2017-02-10 17:03:48 +0000] [4] [CRITICAL] WORKER TIMEOUT (pid:683) 

然后heroku/router显示如下错误:

at=error code=H12 desc="Request timeout" method=GET path="/users/21324/posts/" host=superapp.herokuapp.com request_id=209bd839-baac-4e72-a04e-657d85348f45 fwd="84.78.56.97" dyno=web.2 connect=0ms service=30000ms status=503 bytes=0

我正在运行应用程序:

gunicorn --pythonpath app  app.app:aio_app --worker-class aiohttp.worker.GunicornWebWorker --workers 3

主要代码是:

def init(asyncio_loop):
    app = web.Application(loop=asyncio_loop, middlewares=[middlewares.auth_middleware,
                                                          middlewares.logging_middleware])

    # INIT DBs
    app.redis_pool = asyncio_loop.run_until_complete(
        create_pool((settings.REDIS['host'], settings.REDIS['port']),
                    password=settings.REDIS['password']))

    # Clean connections on stop
    app.on_shutdown.append(close_redis)

    # Add rollbar
    rollbar.init(settings.ROLLBAR_TOKEN, 'production')  # access_token, environment

    # Bind routes
    for r in routes:
        app.router.add_route(r[0], r[1], r[2])

    return app


# Create app
loop = asyncio.get_event_loop()
aio_app = init(loop)

还有一个重定向示例:

async with aiohttp.ClientSession() as s:
    try:
        async with s.request(method=method,
                             url=new_url,
                             headers=new_headers,
                             data=body,
                             allow_redirects=False,
                             timeout=25) as response:
            # Clean response
            resp_headers = MSRepository.filter_response_headers(response.headers)
            resp_body = (await response.read())

            return ResponseDataEntity(resp_headers, response.status, resp_body)
    except asyncio.TimeoutError:
        raise MSConnectionError("Request timeout")
    except Exception as e:
        rollbar.report_message(str(e), extra_data={
            "url": new_url,
            "data": body,
            "method": method,
            "headers": new_headers
        })
        raise MSConnectionError(str(e))

如您所见,发出请求时有 25 秒的超时,而异常正在引发 30 秒的超时。

有人知道发生了什么吗?

(注意:当我写重定向时,我的意思不是说 HTTP 302,我的意思是处理请求、编辑标头、检查身份验证、向适当的 MS 发出异步请求、处理响应并返回此响应)

【问题讨论】:

    标签: python heroku gunicorn gateway aiohttp


    【解决方案1】:

    最后,问题出在其中一个处理程序中。我真的不知道发生了什么,因为所有端点的超时都是完全随机的,但是在每分钟超过 10k 个请求的情况下完美运行 6 小时后,我确信问题就是这个。以下是修复前后的代码:

    async def bad_handler(request):
        # Read body in ALL cases to not to block requests
        if '/event-log/action/' == request.path:
            if request.query_string != '':
                action_type = request.query_string
            else:
                try:
                    request_body = await request.json()
                    action_type = request_body.get('type', '')
                except:
                    action_type = ''
    
            print("Action_type={}".format(action_type))
    
        # Response to client
        return aiohttp.web.json_response(data={}, status=200)
    
    async def good_handler(request):
        # Read body in ALL cases to not to block requests
        try:
            request_body = await request.json()
        except:
            request_body = None
    
        if '/event-log/action/' == request.path:
            if request.query_string != '':
                action_type = request.query_string
            else:
                if request_body is not None:
                    action_type = request_body.get('type', '')
                else:
                    action_type = ''
    
            print("Action_type={}".format(action_type))
    
        # Response to client
        return aiohttp.web.json_response(data={}, status=200)
    

    如您所见,唯一的区别是在一种情况下我们总是在等待身体,而在另一种情况下则不是。

    我会留下这个问题,只是希望有人回答我为什么它现在有效。 :)

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-06-06
      • 1970-01-01
      • 2021-09-16
      • 1970-01-01
      • 2013-10-21
      • 1970-01-01
      • 2020-05-12
      相关资源
      最近更新 更多