将 Django 部署到 App Engine 柔性环境 - 超时错误响应：[4]答案

【问题标题】：Deploying Django to App Engine Flexible Environment - Timeout Error Response: [4]将 Django 部署到 App Engine 柔性环境 - 超时错误响应：[4]
【发布时间】：2020-10-09 18:03:20
【问题描述】：

我正在尝试将我的应用程序部署到灵活的环境中。 Docker 映像构建良好，但当我认为它试图使服务上线时，该过程失败。我的构建超时设置为 1200 值得。

如何进一步询问此错误？我正在努力寻找日志/GCP 系统中的哪个位置，我可以准确找出卡住的进程。这似乎是一个完全不透明的错误，没有任何迹象表明到底出了什么问题。是不是应用程序中有一些错误（在本地运行良好）？如果是这样，我希望它仍在部署中，但只是在我访问该网站时显示错误。

非常感谢任何帮助。

错误：

OperationError: Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.
ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

这是我的 Dockerfile：

FROM gcr.io/google-appengine/python

RUN apt-get update && apt-get install software-properties-common -y
RUN add-apt-repository ppa:ubuntugis/ppa

RUN apt-get install -y gdal-bin


# Create a virtualenv for dependencies. This isolates these packages from
# system-level packages.
# Use -p python3 or -p python3.7 to select python version. Default is version 2.
RUN virtualenv /env -p python3.7



# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

# Copy the application's requirements.txt and run pip to install all
# dependencies into the virtualenv
COPY requirements.txt /tmp
WORKDIR /tmp
RUN pip install -r requirements.txt

# Add the application source code.
ADD . /

EXPOSE 8080
# Run a WSGI server to serve the application. gunicorn must be declared as
# a dependency in requirements.txt.
#CMD gunicorn -b :$PORT main:app

这是我的 app.yaml：

runtime: custom
env: flex

runtime_config:
  # You can also specify 2 for Python 2.7
  python_version: 3.7

【问题讨论】：

您是否尝试增加 app_start_timeout_sec 的值？（最大值为 1800）。这将为您的应用程序启动和准备就绪提供更广泛的时间窗口。详情请参考documentation。
是的 - 不幸的是，它仍然失败了。不过谢谢。
你评论entrypoint和CMD正常吗？
好地方，但不幸的是没有什么不同:(
尝试在 app.yaml 中指定 healthchecks，并验证对 healthcheck 端点的调用是否给出了正确的响应（通常是 200）：cloud.google.com/appengine/docs/flexible/python/reference/…

标签： docker google-app-engine google-cloud-platform geodjango

【解决方案1】：

我几乎可以肯定这是由 gunicorn 超时引起的。

要禁用 gunicorn 的超时行为，请将 Dockerfile 中的最后一条命令更改为：

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app --timeout 0

其中：-- workers 1 --threads 8 表示 1 个工作进程和 8 个线程。（如果您不手动指定资源，则默认为 1 个 CPU 核心）如果您决定使用更多内核，则相应地更改工作线程和线程，但这有点超出了这个问题的范围。

重要的部分是--timeout 0，它基本上可以防止 gunicorn 超时。

如果您仍然看到错误，那么有一个小补充很可能会修复它。启动 gunicorn 时也使用 --preload 标志。所以 Dockerfile 中的最后一个命令是：

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 main:app --timeout 0 --preload

这基本上可以确保在创建托管 docker 容器的实例时完成所需的所有导入和预处理。当您使用需要大量时间进行 1 次预处理的应用程序时，这非常有用。这样，一旦请求到来，所有内容都已加载并准备好为该请求提供服务。

为了最大限度地发挥--preload 的优势，您还需要将所有导入等移至主应用程序的最开头，并避免在路由处理程序中调用导入。

此外，在 app.yaml 和 Dockerfile 中都有 entrypoint 命令是没有意义的。在我看来，最好将它保存在 Dockerfile 中。

另外：

我会将EXPOSE 8080 移动到 FROM 行之后，因为它可以确保您的容器与外部世界有正确的连接。

【讨论】：