【发布时间】:2017-01-21 20:53:53
【问题描述】:
目标
了解问题的原因和解决方案。使用 spark-submit 时会出现问题。感谢您的帮助。
spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar
在spark-shell中逐行运行不会报错。
...
scala> val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
|-- aucid: string (nullable = true)
|-- bid: float (nullable = false)
|-- bidtime: float (nullable = false)
|-- bidder: string (nullable = true)
|-- bidrate: integer (nullable = false)
|-- openbid: float (nullable = false)
|-- price: float (nullable = false)
|-- itemtype: string (nullable = true)
|-- dtl: integer (nullable = false)
问题
调用toDF方法将RDD转为DataFrame会报错。
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
at AuctionDataFrame.main(AuctionDataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
代码
case class Auctions(
aucid: String,
bid: Float,
bidtime: Float,
bidder: String,
bidrate: Int,
openbid: Float,
price: Float,
itemtype: String,
dtl: Int)
object AuctionDataFrame {
val AUCID = 0
val BID = 1
val BIDTIME = 2
val BIDDER = 3
val BIDRATE = 4
val OPENBID = 5
val PRICE = 6
val ITEMTYPE = 7
val DTL = 8
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AuctionDataFrame")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
val auctionsRDD = inputRDD.map(a =>
Auctions(
a(AUCID),
a(BID).toFloat,
a(BIDTIME).toFloat,
a(BIDDER),
a(BIDRATE).toInt,
a(OPENBID).toFloat,
a(PRICE).toFloat,
a(ITEMTYPE),
a(DTL).toInt))
val auctionsDF = auctionsRDD.toDF() // <--- line 52 causing the error.
}
build.sbt
name := "Auction Project"
version := "1.0"
scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"
/*
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
环境
Ubuntu 14.04 上的 Spark:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Windows 上的 sbt:
D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12
研究
调查了表明编译 Spark 的 Scala 版本不兼容的类似问题。
因此将 build.sbt 中的 Scala 版本更改为 2.10,它创建了 2.10 jar,但错误仍然存在。使用 % 提供与否不会改变错误。
scalaVersion := "2.10.6"
【问题讨论】:
-
看起来仍然是版本问题。仔细检查到处都在使用什么版本...
-
@TzachZohar,感谢您的评论,但我将 Scala 版本更改为“2.10.6”,再次运行“sbt clean”和“sbt package”,但没有解决问题。您能否更具体地说明链接的文章如何解决?
标签: scala apache-spark apache-spark-sql