【问题标题】:Spark Scala: "cannot resolve symbol saveAsTextFile (reduceByKey)" - IntelliJ IdeaSpark Scala:“无法解析符号 saveAsTextFile (reduceByKey)” - IntelliJ Idea
【发布时间】:2019-06-26 17:31:23
【问题描述】:

我想有些依赖没有在 build.sbt 文件中定义。

我在 build.sbt 文件中添加了库依赖项,但我仍然收到此问题标题中提到的此错误。尝试在谷歌上搜索解决方案,但找不到它

我的 spark scala 源代码(filterEventId100.scala):

package com.projects.setTopBoxDataAnalysis

import java.lang.System._
import java.text.SimpleDateFormat
import java.util.Date
import org.apache.spark.sql.SparkSession

object filterEventId100 extends App {


  if (args.length < 2) {
    println("Usage: JavaWordCount <Input-File> <Output-file>")
    exit(1)
  }

  val spark = SparkSession
    .builder
    .appName("FilterEvent100")
    .getOrCreate()

  val data = spark.read.textFile(args(0)).rdd


  val result = data.flatMap{line: String => line.split("\n")}
      .map{serverData =>
        val serverDataArray = serverData.replace("^", "::")split("::")
        val evenId = serverDataArray(2)
        if (evenId.equals("100")) {
          val serverId = serverDataArray(0)
          val timestempTo = serverDataArray(3)
          val timestempFrom = serverDataArray(6)
          val server = new Servers(serverId, timestempFrom, timestempTo)
          val res = (serverId, server.dateDiff(server.timestampFrom, server.timestampTo))
          res
        }


  }.reduceByKey{
    case(x: Long, y: Long) => if ((x, y) != null) {
         if (x > y) x else y
    }
  }

  result.saveAsTextFile(args(1))

  spark.stop


}

class Servers(val serverId: String, val timestampFrom: String, val timestampTo: String) {

  val DATE_FORMAT = "yyyy-MM-dd hh:mm:ss.SSS"

  private def convertStringToDate(s: String): Date = {
    val dateFormat = new SimpleDateFormat(DATE_FORMAT)
    dateFormat.parse(s)
  }

  private def convertDateStringToLong(dateAsString: String): Long = {
    convertStringToDate(dateAsString).getTime
  }

  def dateDiff(tFrom: String, tTo: String): Long = {
    val dDiff = convertDateStringToLong(tTo) - tFrom.toLong
    dDiff
  }

}

我的 build.sbt 文件:

name := "SetTopProject"
version := "0.1"
scalaVersion := "2.12.8"

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
  "org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
  "org.apache.hadoop" %% "hadoop-common" % "3.2.0" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
  "org.apache.spark" %% "spark-sql_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
  "org.apache.spark" %% "spark-hive_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy"),
  "org.apache.spark" %% "spark-yarn_2.12" % "2.4.3" exclude ("org.apache.hadoop","hadoop-yarn-server-web-proxy")
)

我原以为一切都会好起来的,因为

val spark = SparkSession
.builder
.appName("FilterEvent100")
.getOrCreate()

定义良好(没有任何编译器错误),我使用 spark value 来定义数据值:

val data = spark.read.textFile(args(0)).rdd

调用 saveAsTextFile 和 reduceByKey 函数:

val result = data.flatMap{line: String => line.split("\n")}...
}.reducedByKey {case(x: Long, y: Long) => if ((x, y) != null) {
     if (x > y) x else y
}
result.saveAsTextFile(args(1))

我应该如何消除 saveAsTextFilereduceByKey 函数调用的编译器错误?

【问题讨论】:

  • 只检查result的类型。因为你只有if 表达式,它会给出类似RDD[Any] 的东西,因此reduceByKey 不适用。您应该使用flatMap / collect 而不是map,或者添加filter
  • 什么是编译器错误?
  • 编译错误:1) 无法解析符号 saveAsTextFile 2) 无法解析符号 reduceByKey

标签: scala apache-spark intellij-idea


【解决方案1】:

替换

 val spark = SparkSession
    .builder
    .appName("FilterEvent100")
    .getOrCreate()

  val data = spark.read.textFile(args(0)).rdd

val conf = new SparkConf().setAppName("FilterEvent100")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

val data = sc.textfile(args(0))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-02-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-17
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多