【发布时间】:2022-01-25 05:46:12
【问题描述】:
我正在尝试使用 PEX 打包 pyspark 作业以在 google cloud dataproc 上运行,但我收到了 Permission Denied 错误。
我已将我的第三方和本地依赖项打包到 env.pex 中,并将使用这些依赖项的入口点打包到 main.py 中。然后我 gsutil cp 这两个文件直到 gs://<PATH> 并运行下面的脚本。
from google.cloud import dataproc_v1 as dataproc
from google.cloud import storage
def submit_job(project_id: str, region: str, cluster_name: str):
job_client = dataproc.JobControllerClient(
client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
)
operation = job_client.submit_job_as_operation(
request={
"project_id": project_id,
"region": region,
"job": {
"placement": {"cluster_name": cluster_name},
"pyspark_job": {
"main_python_file_uri": "gs://<PATH>/main.py",
"file_uris": ["gs://<PATH>/env.pex"],
"properties": {
"spark.pyspark.python": "./env.pex",
"spark.executorEnv.PEX_ROOT": "./.pex",
},
},
},
}
)
我得到的错误是
Exception in thread "main" java.io.IOException: Cannot run program "./env.pex": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 14 more
我应该期望像这样打包我的环境吗?我看不到更改 pyspark 作业配置中包含为 file_uris 的文件权限的方法,而且我在谷歌云上没有看到任何关于使用 PEX 打包的文档,但 PySpark official docs include this guide。
任何帮助表示赞赏 - 谢谢!
【问题讨论】:
标签: google-cloud-platform pyspark google-cloud-dataproc dataproc python-pex