【问题标题】:Airflow scheduler fails to start with kubernetes executor气流调度程序无法与 kubernetes 执行程序一起启动
【发布时间】:2020-05-24 15:01:07
【问题描述】:

我正在使用 https://github.com/helm/charts/tree/master/stable/airflow helm 图表并构建 v1.10.8 puckle/docker-airflow 图像,并在其上安装了 kubernetes 并在 helm 图表中使用该图像, 但我不断得到

  File "/usr/local/bin/airflow", line 37, in <module>
    args.func(args)
  File "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1140, in initdb
    db.initdb(settings.RBAC)
  File "/usr/local/lib/python3.7/site-packages/airflow/utils/db.py", line 332, in initdb
    dagbag = models.DagBag()
  File "/usr/local/lib/python3.7/site-packages/airflow/models/dagbag.py", line 95, in __init__
    executor = get_default_executor()
  File "/usr/local/lib/python3.7/site-packages/airflow/executors/__init__.py", line 48, in get_default_executor
    DEFAULT_EXECUTOR = _get_executor(executor_name)
  File "/usr/local/lib/python3.7/site-packages/airflow/executors/__init__.py", line 87, in _get_executor
    return KubernetesExecutor()
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 702, in __init__
    self.kube_config = KubeConfig()
  File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 283, in __init__
    self.kube_client_request_args = json.loads(kube_client_request_args)
  File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

在我的调度程序中,也正如各种消息来源所建议的那样, 我试过设置:

AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [60,60] }

在我的掌舵价值观中。这也没有工作任何人有任何想法我错过了什么?

这是我的 values.yaml


airflow:
  image:
     repository: airflow-docker-local
     tag: 1.10.8
  executor: Kubernetes
  service:
    type: LoadBalancer
  config:
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: airflow-docker-local
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: 1.10.8
    AIRFLOW__KUBERNETES__WORKER_CONTAINER_IMAGE_PULL_POLICY: Never

    AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME: airflow
    AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM: airflow
    AIRFLOW__KUBERNETES__NAMESPACE: airflow
    AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: {"_request_timeout" : [60,60] }

    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow@airflow-postgresql:5432/airflow

persistence:
  enabled: true
  existingClaim: ''

workers:
  enabled: false

postgresql:
  enabled: true

redis:
  enabled: false

编辑:

在 helm values.yaml 中设置环境变量的各种尝试都不起作用,之后我添加了 (注意双引号和单引号)

ENV AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS='{"_request_timeout" : [60,60] }'

Dockerfile 这里:https://github.com/puckel/docker-airflow/blob/1.10.9/Dockerfile#L19 之后我的airflow-scheduler pod 启动,但随后我的调度程序 pod 上不断出现以下错误。

Process KubernetesJobWatcher-9: Traceback (most recent call last): 
    File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 313, 
    in recv_into return self.connection.recv_into(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", 
    line 1840, in recv_into self._raise_ssl_error(self._ssl, result) File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", 
    line 1646, in _raise_ssl_error raise WantReadError() OpenSSL.SSL.WantReadError

【问题讨论】:

  • 这里有同样的问题。我检查了docker-airflow:1.10.8 并将config/airflow.cfg 的L931 更改为kube_client_request_args = 并使用此图像。似乎工作
  • 这里有同样的问题 :-(

标签: kubernetes airflow kubernetes-helm airflow-scheduler


【解决方案1】:

对于 helm 值,模板使用一个循环来放置 airflow.config 映射 into double quotes "。这意味着需要对值中的任何 " 进行转义,以使输出的模板化 YAML 有效。

airflow:
  config:
    AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS: '{\"_request_timeout\":60}'

部署并运行(但我还没有完成端到端测试)

根据this github issue,python 调度程序 SSL 超时可能不是问题,因为 watcher 在 60 秒连接超时后再次启动。

【讨论】:

  • 没问题!您是否能够验证调度程序是否正常工作,它只是每 X 秒重新连接一次?
  • 是的,调度程序工作。虽然遇到了很多其他问题:-(
猜你喜欢
  • 1970-01-01
  • 2023-04-04
  • 1970-01-01
  • 2019-01-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-12-31
相关资源
最近更新 更多