【发布时间】:2021-10-07 23:16:58
【问题描述】:
我正在将数据帧从 CDP7 中的 pyspark 数据帧存储到 hbase 表,在此 example 之后,我使用的组件是:
- Spark 版本 3.1.1
- Scala 版本 2.12.10
- shc-core-1.1.1-2.1-s_2.11.jar
我使用的命令:
spark3-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml test-hbase3.py
但是,我收到了这个错误,我需要将它放在 hastebin.com 中,如下所示: spark-log
错误sn-p:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 45, in <module>
main()
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 24, in main
writeDF.write.options(catalog=writeCatalog, newtable=5).format(dataSourceFormat).save()
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.save.
: java.lang.NoClassDefFoundError: scala/Product$class
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:73)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
我应该怎么做才能修复错误?我试图找到其他连接器。但是,只找到了 SHC 连接器。我在这里没有使用任何 Maven 存储库。但是,不确定是否缺少依赖项或其他错误。
【问题讨论】:
标签: apache-spark hbase connector