【问题标题】:PySpark run locally on Mac: Caused by: java.io.IOException: Cannot run program "/usr/local/Cellar/apache-spark/3.0.1/libexec/bin"PySpark 在 Mac 上本地运行:原因:java.io.IOException:无法运行程序“/usr/local/Cellar/apache-spark/3.0.1/libexec/bin”
【发布时间】:2021-07-19 13:14:15
【问题描述】:

我在 pyCharm 本地运行时遇到此错误并尝试了所有选项:

Caused by: java.io.IOException: Cannot run program "/usr/local/Cellar/apache-spark/3.0.1/libexec/bin": error=13, Permission denied
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:209)
    at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:132)
    at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:105)
    at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:119)
    at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:131)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

./bash_profile: 导出 SPARK_HOME=/usr/local/opt/apache-spark/libexec/ 导出 PYTHONPATH=/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/opt/apache-spark/libexec/python/:/usr/local /lib/python3.9:$PYTHONP$ 导出 PATH=$SPARK_HOME/bin/:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PATH #export PATH=$SPARK_HOME/python:$PATH

ls -lrt /usr/local/opt/apache-spark:

/usr/local/opt/apache-spark -> ../Cellar/apache-spark/3.0.1

PyCharm 中的 Python 解释器: /usr/local/bin/python3

在我的代码中:

if __name__ == '__main__':
    #import os
    #import sys
    #os.environ['SPARK_HOME'] = "/usr/local/opt/apache-spark/libexec/"
    #sys.path.append("/usr/local/opt/apache-spark/libexec/python")
    #sys.path.append("/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip")
    #findspark.init()
    #conf = SparkConf()
    #conf.set("fs.defaultFS", "file:///")
    spark = SparkSession.builder.master("local").appName("SyslogMaskUtility").getOrCreate()
    sc = spark.sparkContext
    #sc.setLogLevel("WARN")
    rdd_raw = sc.textFile('/Users/abcd/PycharmProjects/SyslogToJson/SyslogParser/syslog_event.txt')
    print(rdd_raw.count())
    spark.stop()

我关注了: https://medium.com/beeranddiapers/installing-apache-spark-on-mac-os-ce416007d79f

并提到: Spark installation seems ok but when running program I'm having issues with environment variables. Is this .bash_profile correct?

/usr/local/opt/apache-spark/libexec/下的所有目录和文件都拥有所有权限:

drwxrwxrwx   13 abcd  admin   416 Oct 29 17:34 bin

任何帮助,因为我正在努力解决这个问题。 当我从 pyspark 命令行运行时,相同的代码可以工作。

谢谢。

【问题讨论】:

标签: macos apache-spark pyspark


【解决方案1】:

在我的 Mac 上,我分别安装 Spark 和 Hadoop

# install PySpark
pip3 install pyspark

# download and extract Hadoop 3.2.2
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
unzip hadoop-3.2.2.tar.gz

# setup environment variables
export JAVA_HOME='/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/Home/'
export SPARK_DIST_CLASSPATH="hadoop-3.2.2/share/hadoop/tools/lib/*"

# run Python
python3
from pyspark.sql import SparkSession
from pyspark.sql import types as T

spark = (SparkSession
    .builder
    .master('local[*]')
    .appName('SO')
    .getOrCreate()
)
# <pyspark.sql.session.SparkSession object at 0x10f9e7220>

【讨论】:

  • 我完成了上述步骤。从终端我可以运行,但从 PyCharm 我得到同样的错误 - 原因:java.io.IOException:无法运行程序“/usr/local/Cellar/apache-spark/3.0.1/libexec/bin”:错误=13,权限被拒绝
  • export JAVA_HOME=/Library/Java/JavaVirtualMachiexport SPARK_HOME=/usr/local/opt/apache-spark/libexec/export SPARK_DIST_CLASSPATH="/Users/rm185431/hadoop-3.2.2/share/hadoop /tools/lib/*" 导出 PYTHONPATH=/usr/local/opt/apache-spark/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/opt/apache-spark/libexec/ python/:/usr/local/lib/python3.9:$PYTHONP$ export SPARK_HOME=/usr/local/opt/apache-spark/libexec/ export PATH=$SPARK_HOME/bin/:$SPARK_HOME/python/lib/py4j -0.10.9-src.zip:$PATH 导出 PATH=$SPARK_HOME/python:$PATH
  • 以上是我添加到我的.bash_profile中的路径
  • 编辑配置中的环境变量有:PYTHONUNBUFFERED=1;PYSPARK_PYTHON=/usr/local/Cellar/apache-spark/3.0.1/libexec/bin;PYSPARK_DRIVER_PYTHON=/usr/local/Cellar/apache -spark/3.0.1/libexec/bin;PYTHONPATH=/usr/local/Cellar/apache-spark/3.0.1/libexec/python/lib/py4j-0.10.9-src.zip:/usr/local/Cellar /apache-spark/3.0.1/libexec/python/
猜你喜欢
  • 1970-01-01
  • 2022-08-13
  • 1970-01-01
  • 2020-07-19
  • 1970-01-01
  • 2020-02-21
  • 1970-01-01
  • 1970-01-01
  • 2019-04-11
相关资源
最近更新 更多