【发布时间】:2021-07-19 13:14:15
【问题描述】:
我在 pyCharm 本地运行时遇到此错误并尝试了所有选项:
Caused by: java.io.IOException: Cannot run program "/usr/local/Cellar/apache-spark/3.0.1/libexec/bin": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:131)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
./bash_profile: 导出 SPARK_HOME=/usr/local/opt/apache-spark/libexec/ 导出 PYTHONPATH=/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/opt/apache-spark/libexec/python/:/usr/local /lib/python3.9:$PYTHONP$ 导出 PATH=$SPARK_HOME/bin/:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PATH #export PATH=$SPARK_HOME/python:$PATH
ls -lrt /usr/local/opt/apache-spark:
/usr/local/opt/apache-spark -> ../Cellar/apache-spark/3.0.1
PyCharm 中的 Python 解释器: /usr/local/bin/python3
在我的代码中:
if __name__ == '__main__':
#import os
#import sys
#os.environ['SPARK_HOME'] = "/usr/local/opt/apache-spark/libexec/"
#sys.path.append("/usr/local/opt/apache-spark/libexec/python")
#sys.path.append("/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip")
#findspark.init()
#conf = SparkConf()
#conf.set("fs.defaultFS", "file:///")
spark = SparkSession.builder.master("local").appName("SyslogMaskUtility").getOrCreate()
sc = spark.sparkContext
#sc.setLogLevel("WARN")
rdd_raw = sc.textFile('/Users/abcd/PycharmProjects/SyslogToJson/SyslogParser/syslog_event.txt')
print(rdd_raw.count())
spark.stop()
我关注了: https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f
/usr/local/opt/apache-spark/libexec/下的所有目录和文件都拥有所有权限:
drwxrwxrwx 13 abcd admin 416 Oct 29 17:34 bin
任何帮助,因为我正在努力解决这个问题。 当我从 pyspark 命令行运行时,相同的代码可以工作。
谢谢。
【问题讨论】:
-
尝试不使用自制软件安装 Spark
-
好的。我会遵循这个:medium.com/luckspark/…
标签: macos apache-spark pyspark