【问题标题】:Dataproc ignoring Spark configurationDataproc 忽略 Spark 配置
【发布时间】:2020-12-10 06:58:58
【问题描述】:

我在 dataproc 集群中运行下面的 spark 提交命令,但我注意到很少有 spark 配置被忽略。我可以知道他们被忽略的原因吗?

gcloud dataproc jobs submit spark --cluster=<Cluster> --class=<class_name> --jars=<list_of_jars> --region=<region> --files=<list_of_files> --properties=spark.driver.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties",spark.executor.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties, spark.executor.instances=36, spark.executor.cores=4, spark.executor.memory=4G, spark.driver.memory=8G, spark.shuffle.service.enabled=true, spark.yarn.maxAppAttempts=1, spark.sql.shuffle.partitions=200, spark.executor.memoryOverhead=7680, spark.driver.maxResultSize=0, spark.port.maxRetries=250, spark.dynamicAllocation.initialExecutors=20, spark.dynamicAllocation.minExecutors=10"


Warning: Ignoring non-Spark config property:  spark.driver.maxResultSize
Warning: Ignoring non-Spark config property:  spark.driver.memory
Warning: Ignoring non-Spark config property:  spark.dynamicAllocation.minExecutors
Warning: Ignoring non-Spark config property:  spark.executor.cores
Warning: Ignoring non-Spark config property:  spark.port.maxRetries
Warning: Ignoring non-Spark config property:  spark.yarn.maxAppAttempts
Warning: Ignoring non-Spark config property:  spark.dynamicAllocation.initialExecutors
Warning: Ignoring non-Spark config property:  spark.executor.memory
Warning: Ignoring non-Spark config property:  spark.executor.memoryOverhead
Warning: Ignoring non-Spark config property:  spark.sql.shuffle.partitions
Warning: Ignoring non-Spark config property:  spark.executor.instances

【问题讨论】:

    标签: apache-spark google-cloud-platform google-cloud-dataproc


    【解决方案1】:

    请改用下面的方法。它们不是extraJavaOptions,而是属于properties

    gcloud dataproc jobs submit spark --cluster=<Cluster> --class=<class_name> --jars=<list_of_jars> --region=<region> --files=<list_of_files> --properties=spark.driver.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties",spark.executor.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties",spark.executor.instances=36,spark.executor.cores=4,spark.executor.memory=4G,spark.driver.memory=8G,spark.shuffle.service.enabled=true,spark.yarn.maxAppAttempts=1,spark.sql.shuffle.partitions=200,spark.executor.memoryOverhead=7680,spark.driver.maxResultSize=0,spark.port.maxRetries=250,spark.dynamicAllocation.initialExecutors=20,spark.dynamicAllocation.minExecutors=10
    

    以更易读的形式:

    gcloud dataproc jobs submit spark --cluster=<Cluster> --class=<class_name> --jars=<list_of_jars> --region=<region> --files=<list_of_files> 
    --properties=spark.driver.extraJavaOptions="
        -Dconfig.file=application_dev.json
        -Dlog4j.configuration=log4j.properties
    ",spark.executor.extraJavaOptions="
        -Dconfig.file=application_dev.json
        -Dlog4j.configuration=log4j.properties
    ",
    spark.executor.instances=36,
    spark.executor.cores=4,
    spark.executor.memory=4G,
    spark.driver.memory=8G,
    spark.shuffle.service.enabled=true,
    spark.yarn.maxAppAttempts=1,
    spark.sql.shuffle.partitions=200,
    spark.executor.memoryOverhead=7680,
    spark.driver.maxResultSize=0,
    spark.port.maxRetries=250,
    spark.dynamicAllocation.initialExecutors=20,
    spark.dynamicAllocation.minExecutors=10
    

    【讨论】:

    • 它现在给我下面的错误错误:(gcloud.dataproc.jobs.submit.spark)无法识别的参数:spark.executor.cores=4,spark.executor.memory=4G,spark.driver .memory=8G,spark.shuffle.service.enabled=true,spark.yarn.maxAppAttempts=1,spark.sql.shuffle.partitions=200,spark.executor.memoryOverhead=7680,spark.driver.maxResultSize=0,spark .port.maxRetries=250, spark.dynamicAllocation.initialExecutors=20, spark.dynamicAllocation.minExecutors=10
    • @arunkindra 尽量不要在逗号之间放置空格?它们应该在 --properties 之后形成一个连续的字符串
    【解决方案2】:

    你可以试试这个吗?

    gcloud dataproc jobs submit spark \
      --cluster=<Cluster> \
      --class=<class_name> \
      --jars=<list_of_jars> \
      --region=<region> \
      --files=<list_of_files> \
      --properties=^#^spark.driver.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties"#spark.executor.extraJavaOptions="-Dconfig.file=application_dev.json -Dlog4j.configuration=log4j.properties"#spark.executor.instances=36#spark.executor.cores=4#spark.executor.memory=4G#spark.driver.memory=8G#spark.shuffle.service.enabled=true#spark.yarn.maxAppAttempts=1#spark.sql.shuffle.partitions=200#spark.executor.memoryOverhead=7680#spark.driver.maxResultSize=0#spark.port.maxRetries=250#spark.dynamicAllocation.initialExecutors=20#spark.dynamicAllocation.minExecutors=10
    

    【讨论】:

      猜你喜欢
      • 2020-07-19
      • 1970-01-01
      • 2016-04-18
      • 2015-08-31
      • 2016-09-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多