【问题标题】:Passing a filename argument into Dataproc Hadoop job将文件名参数传递到 Dataproc Hadoop 作业
【发布时间】:2020-01-18 04:59:09
【问题描述】:

我尝试在 GCP Dataproc 集群上运行一个 Hadoop 作业。该作业采用一个参数,一个文件名,用于配置作业的某些方面。我似乎找不到使用 gcloud CLI 成功执行此操作的方法(尽管我可以在 Airflow 中成功运行该作业)。我尝试过使用本地文件、Google Storage 中的文件和集群本身的文件。

这是我的工作中读取文件的代码:

File inFile = null;
Properties appProps = new Properties();

inFile = new File(args[0]);

try (FileInputStream inFileStream = new FileInputStream(inFile)) {
    appProps.load(inFileStream);
} catch (IOException e) {
    log.error("Unable to read file");
}

这是我运行命令的尝试(其中 props_test 是文件):

# local file
$ gcloud dataproc jobs submit hadoop --project my-project --region global --cluster my-cluster --jar gs://path/to/hadoop_job.jar -- props_test
Job [00653af3afaf40ea8227d5a56e3de458] submitted.
Waiting for job output...
20/01/17 19:29:55 INFO job.JobDriver: props_test
20/01/17 19:29:55 ERROR job.JobDriver: Unable to read file
Exception in thread "main" java.lang.NullPointerException
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
    at com.google.common.base.Splitter.split(Splitter.java:371)
    at job.JobDriver.parseHadoopConfig(JobDriver.java:52)
    at job.JobDriver.run(JobDriver.java:101)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at job.JobDriver.main(JobDriver.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunJarShim.main(HadoopRunJarShim.java:12)
ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [00653af3afaf40ea8227d5a56e3de458] failed with error:
Job failed with message [Exception in thread "main" java.lang.NullPointerException]. Additional details can be found at '. . .'.

# Google Storage file
$ gcloud dataproc jobs submit hadoop --project my-project --region global --cluster my-cluster --jar gs://path/to/hadoop_job.jar -- gs://path/to/props_test
Job [8ec4a775a0fb4e31b77967d99280fd6c] submitted.
Waiting for job output...
20/01/17 19:30:15 INFO job.JobDriver: gs://path/to/props_test
20/01/17 19:30:15 ERROR job.JobDriver: Unable to read file
Exception in thread "main" java.lang.NullPointerException
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
    at com.google.common.base.Splitter.split(Splitter.java:371)
    at job.JobDriver.parseHadoopConfig(JobDriver.java:52)
    at job.JobDriver.run(JobDriver.java:101)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at job.JobDriver.main(JobDriver.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunJarShim.main(HadoopRunJarShim.java:12)
ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [8ec4a775a0fb4e31b77967d99280fd6c] failed with error:
Job failed with message [Exception in thread "main" java.lang.NullPointerException]. Additional details can be found at '. . .'.

# File on Dataproc cluster
$ gcloud dataproc jobs submit hadoop --project my-project --region global --cluster my-cluster --jar gs://path/to/hadoop_job.jar -- file:///path/to/props_test
Job [fda035c5159642c294a47385d5ebb85f] submitted.
Waiting for job output...
20/01/17 19:33:23 INFO job.JobDriver: props_test
20/01/17 19:33:23 ERROR job.JobDriver: Unable to read file
Exception in thread "main" java.lang.NullPointerException
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
    at com.google.common.base.Splitter.split(Splitter.java:371)
    at job.JobDriver.parseHadoopConfig(JobDriver.java:52)
    at job.JobDriver.run(JobDriver.java:101)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at job.JobDriver.main(JobDriver.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:244)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:158)
    at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunJarShim.main(HadoopRunJarShim.java:12)
ERROR: (gcloud.dataproc.jobs.submit.hadoop) Job [fda035c5159642c294a47385d5ebb85f] failed with error:
Job failed with message [Exception in thread "main" java.lang.NullPointerException]. Additional details can be found at '. . .'.

【问题讨论】:

    标签: google-cloud-platform google-cloud-dataproc


    【解决方案1】:

    请先尝试在 GCS 中暂存文件,然后通过 --files arg 以完全限定名称传递它,然后将其作为 args 中的文件名作为:

    gcloud dataproc jobs submit hadoop ... \
        --files gs://path/to/file.ini \
        -- arg1 file.ini arg3
    

    【讨论】:

      猜你喜欢
      • 2018-05-12
      • 2022-10-17
      • 2018-05-31
      • 2011-07-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-05-14
      • 2015-05-06
      相关资源
      最近更新 更多