【问题标题】:py4j.protocol.Py4JJavaError: An error occurred while calling o63.save. : java.lang.NoClassDefFoundError: org/apache/spark/Loggingpy4j.protocol.Py4JJavaError: 调用 o63.save 时出错。 : java.lang.NoClassDefFoundError: org/apache/spark/Logging
【发布时间】:2022-05-04 19:57:40
【问题描述】:

我是 Spark 和 BigData 组件 - HBase 的新手,我正在尝试在 Pyspark 中编写 Python 代码并连接到 HBase 以从 HBase 读取数据。我正在使用以下版本:

  • Spark 版本:spark-3.1.2-bin-hadoop2.7
  • Python 版本:3.8.5
  • HBase 版本:hbase-2.3.5

我在 ubuntu 20.04 的本地安装了独立的 Hbase 和 Spark

代码:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext.getOrCreate()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'

df = sc.parallelize([("1","Abby","Smith","K","3456main","Orlando","FL","45235"), 
    ("2","Amaya","Williams","L","123Orange","Newark","NJ","27656"),("3","Alchemy","Davis","P","Warners","Sanjose","CA","34789")])
    .toDF(schema=['key','firstName','lastName','middleName','addressLine','city','state','zipCode'])

df.show()

catalog=''.join('''{
    "table":{"namespace":"emp_data","name":"emp_info"},
    "rowkey":"key",
    "columns":{
    "key":{"cf":"rowkey","col":"key","type":"string"},
    "fName":{"cf":"person","col":"firstName","type":"string"},
    "lName":{"cf":"person","col":"lastName","type":"string"},
    "mName":{"cf":"person","col":"middleName","type":"string"},
    "addressLine":{"cf":"address","col":"addressLine","type":"string"},
    "city":{"cf":"address","col":"city","type":"string"},
    "state":{"cf":"address","col":"state","type":"string"},
    "zipCode":{"cf":"address","col":"zipCode","type":"string"}
            }
    }'''.split())
#Writing
print("Writing into HBase")
df.write\
  .options(catalog=catalog)\
  .format(data_source_format)\
  .save()
#Reading
print("Readig from HBase")
df = sqlc.read\
         .options(catalog=catalog)\
         .format(data_source_format)\
         .load()

print("Program Ends")

错误信息:

写入 HBase 回溯(最近一次通话最后): 文件“/mnt/c/Codefiles/pyspark_test.py”,第 36 行,在 df.write
文件“/home/aditya/spark-3.1.2-bin- hadoop2.7/python/lib/pyspark.zip/pyspark/sql/readwriter.py”,行 1107,在保存中 文件“/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/py4j-0.10.9- src.zip/py4j/java_gateway.py",第 1304 行,在 call 中 文件“/home/aditya/spark-3.1.2-bin- hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py”,第 111 行,在 装饰 文件“/home/aditya/spark-3.1.2-bin-hadoop2.7/python/lib/py4j-0.10.9- src.zip/py4j/protocol.py”,第 326 行,在 get_return_value 中 py4j.protocol.Py4JJavaError: 调用时出错 o63.保存。 : java.lang.NoClassDefFoundError: org/apache/spark/Logging 在 java.lang.ClassLoader.defineClass1(本机方法) 在 java.lang.ClassLoader.defineClass(ClassLoader.java:756)

【问题讨论】:

    标签: python apache-spark pyspark bigdata hbase


    【解决方案1】:

    检查是否安装了 java 并且在 env 变量中设置了 Java_home。

    【讨论】:

    猜你喜欢
    • 2017-03-10
    • 2019-06-01
    • 1970-01-01
    • 1970-01-01
    • 2018-06-18
    • 2010-11-26
    • 2021-07-22
    • 2021-04-17
    相关资源
    最近更新 更多