【发布时间】:2019-06-26 17:31:23
【问题描述】:
我想有些依赖没有在 build.sbt 文件中定义。
我在 build.sbt 文件中添加了库依赖项,但我仍然收到此问题标题中提到的此错误。尝试在谷歌上搜索解决方案,但找不到它
我的 spark scala 源代码(filterEventId100.scala):
package com.projects.setTopBoxDataAnalysis
import java.lang.System._
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql.SparkSession
object filterEventId100 extends App {
if (args.length < 2) {
println("Usage: JavaWordCount <Input-File> <Output-file>")
exit(1)
}
val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()
val data = spark.read.textFile(args(0)).rdd
val result = data.flatMap{line: String => line.split("\n")}
.map{serverData =>
val serverDataArray = serverData.replace("^", "::")split("::")
val evenId = serverDataArray(2)
if (evenId.equals("100")) {
val serverId = serverDataArray(0)
val timestempTo = serverDataArray(3)
val timestempFrom = serverDataArray(6)
val server = new Servers(serverId, timestempFrom, timestempTo)
val res = (serverId, server.dateDiff(server.timestampFrom, server.timestampTo))
res
}
}.reduceByKey{
case(x: Long, y: Long) => if ((x, y) != null) {
if (x > y) x else y
}
}
result.saveAsTextFile(args(1))
spark.stop
}
class Servers(val serverId: String, val timestampFrom: String, val timestampTo: String) {
val DATE_FORMAT = "yyyy-MM-dd hh:mm:ss.SSS"
private def convertStringToDate(s: String): Date = {
val dateFormat = new SimpleDateFormat(DATE_FORMAT)
dateFormat.parse(s)
}
private def convertDateStringToLong(dateAsString: String): Long = {
convertStringToDate(dateAsString).getTime
}
def dateDiff(tFrom: String, tTo: String): Long = {
val dDiff = convertDateStringToLong(tTo) - tFrom.toLong
dDiff
}
}
我的 build.sbt 文件:
name := "SetTopProject"
version := "0.1"
scalaVersion := "2.12.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.hadoop" %% "hadoop-common" % "3.2.0" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-hive_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
"org.apache.spark" %% "spark-yarn_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy")
)
我原以为一切都会好起来的,因为
val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()
定义良好(没有任何编译器错误),我使用 spark value 来定义数据值:
val data = spark.read.textFile(args(0)).rdd
调用 saveAsTextFile 和 reduceByKey 函数:
val result = data.flatMap{line: String => line.split("\n")}...
}.reducedByKey {case(x: Long, y: Long) => if ((x, y) != null) {
if (x > y) x else y
}
result.saveAsTextFile(args(1))
我应该如何消除 saveAsTextFile 和 reduceByKey 函数调用的编译器错误?
【问题讨论】:
-
只检查
result的类型。因为你只有if表达式,它会给出类似RDD[Any]的东西,因此reduceByKey不适用。您应该使用flatMap/collect而不是map,或者添加filter。 -
什么是编译器错误?
-
编译错误:1) 无法解析符号 saveAsTextFile 2) 无法解析符号 reduceByKey
标签: scala apache-spark intellij-idea