【问题标题】:Jupyter ImportError: No module named py4j.protocol despite py4j is installedJupyter ImportError:尽管安装了 py4j,但没有名为 py4j.protocol 的模块
【发布时间】:2018-10-12 17:48:10
【问题描述】:

我阅读了一些关于我现在在导入 pyspark 时看到的错误的帖子,有人建议 install py4j,我已经这样做了,但我仍然看到错误。

I am using a conda environment, here is the steps:
1. create a yml file and include the needed packages (including the py4j)
2. create a env based on the yml
3. create a kernel pointing to the env
4. start the kernel in Jupyter
5. running `import pyspark` throws error: ImportError: No module named py4j.protocol

【问题讨论】:

  • 你添加了 SPARK_HOME 吗?
  • 是的,谢谢。

标签: pyspark jupyter conda


【解决方案1】:

通过在 kernel.json 中添加 environment 部分并明确指定以下变量来解决该问题:

 "env": {
  "HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
  "PYSPARK_PYTHON":"/opt/cloudera/parcels/Anaconda/bin/python",
  "SPARK_HOME": "/opt/cloudera/parcels/SPARK2",
  "PYTHONPATH": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.7-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
  "PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
  "PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
 }

【讨论】:

    猜你喜欢
    • 2021-03-20
    • 1970-01-01
    • 1970-01-01
    • 2017-03-25
    • 1970-01-01
    • 1970-01-01
    • 2018-12-13
    • 2015-07-24
    • 1970-01-01
    相关资源
    最近更新 更多