【发布时间】:2026-02-04 06:10:02
【问题描述】:
我正在尝试弄清楚如何将安全层添加到使用 GCP 上 GKE 上的 helm 部署的 Dask 集群中,这将强制用户将证书和密钥文件输入到安全对象中,如本文档中所述[1]。不幸的是,调度程序 pod 崩溃时出现超时错误。查日志,报错如下:
Traceback (most recent call last):
File "/opt/conda/bin/dask-scheduler", line 10, in <module>
sys.exit(go())
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 226, in go
main()
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 206, in main
**kwargs
File "/opt/conda/lib/python3.7/site-packages/distributed/scheduler.py", line 1143, in __init__
self.connection_args = self.security.get_connection_args("scheduler")
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 224, in get_connection_args
"ssl_context": self._get_tls_context(tls, ssl.Purpose.SERVER_AUTH),
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 187, in _get_tls_context
ctx = ssl.create_default_context(purpose=purpose, cafile=ca)
File "/opt/conda/lib/python3.7/ssl.py", line 584, in create_default_context
context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory
Helm Config Yaml 文件如下:
scheduler:
allowed-failures: 5
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "myca.pem"
我按如下方式创建密钥和证书文件:
openssl req -newkey rsa:4096 -nodes -sha256 -x509 -days 3650 -nodes -out myca.pem -keyout mykey.pem
这是一个最小的完整可验证示例:
import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security
sec = Security(tls_ca_file='myca.pem',
tls_client_cert='myca.pem',
tls_client_key='mykey.pem',
require_encryption=True)
with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
engine='python',
error_bad_lines=False,
encoding="utf-8",
assume_missing=True
)
print(ddf.shape[0].compute())
【问题讨论】:
-
我在 Github 问题跟踪器中添加了一个问题:github.com/dask/helm-chart/issues/82
标签: python ssl cluster-computing dask dask-distributed