【问题标题】:dataproc create cluster gcloud equivalent command in pythondataproc 在 python 中创建集群 gcloud 等效命令
【发布时间】:2021-02-21 18:53:09
【问题描述】:

如何在 python 中复制以下 gcloud 命令?

gcloud beta dataproc clusters create spark-nlp-cluster \
     --region global \
     --metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \
     --worker-machine-type n1-standard-1 \
     --num-workers 2 \
     --image-version 1.4-debian10 \
     --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
     --optional-components=JUPYTER,ANACONDA \
     --enable-component-gateway 

这是我目前在 python 中所拥有的:


    cluster_data = {
        "project_id": project,
        "cluster_name": cluster_name,
        "config": {
            "gce_cluster_config": {"zone_uri": zone_uri},
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
            "software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}}
            
        },
    }

    cluster = dataproc.create_cluster(
        request={"project_id": project, "region": region, "cluster": cluster_data}
    )

不确定如何将这些 gcloud 命令转换为 python:

     --metadata 'PIP_PACKAGES=google-cloud-storage spark-nlp==2.5.3' \  
     --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
     --enable-component-gateway 

【问题讨论】:

    标签: python-3.x google-cloud-platform dataproc


    【解决方案1】:

    你可以这样试试:

    cluster_data = {
        "project_id": project,
        "cluster_name": cluster_name,
        "config": {
            "gce_cluster_config": {"zone_uri": zone_uri},
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-1"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-1"},
            "software_config":{"image_version":"1.4-debian10","optional_components":{"JUPYTER","ANACONDA"}},
            "initialization_actions":{"executable_file" : "gs://dataproc-initialization-actions/python/pip-install.sh"},
            "gce_cluster_config": {"metadata": "PIP_PACKAGES=google-cloud-storage,spark-nlp==2.5.3"},
            "endpoint_config": {"enable_http_port_access":True},
            
        },
    }
    

    您可以访问更多:GCP Cluster Configs

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-05
      • 1970-01-01
      • 2020-09-18
      • 2022-01-16
      相关资源
      最近更新 更多