【问题标题】:Scala reading file with SparkScala 使用 Spark 读取文件
【发布时间】:2019-11-06 13:25:41
【问题描述】:

我正在尝试读取如下所示的文件:

you 0.0432052044116
i 0.0391075831328
the 0.0328010698268
to 0.0237549924919
a 0.0209682886489
it 0.0198104294359

例如,我想将它存储在带有 (you,0.0432) 的 RDD (key,value) 中。 目前我只做了那个算法

val filename = "freq2.txt"
try {
for (line <- Source.fromFile(filename).getLines()) {
    val tuple = line.split(" ")
    val key = tuple(0)
    val words = tuple(1)
    println(s"${key}")
    println(s"${words}")
  }

} catch {
  case ex: FileNotFoundException => println("Couldn't find that file.")
  case ex: IOException => println("Had an IOException trying to read that file")
}

但我不知道如何存储数据...

【问题讨论】:

    标签: scala apache-spark


    【解决方案1】:

    您可以直接将数据读入RDD:

    val FIELD_SEP = " " //or whatever you have
    val dataset = sparkContext.textFile(sourceFile).map(line => {
        val word::score::other = line.split(FIELD_SEP).toList
        (word, score)
    })
    

    【讨论】:

      【解决方案2】:
      val filename = "freq2.txt"
      sc.textFile(filename).split("\\r?\\n").map(x =>{
                        var data = x.trim().split(" ")
                        (data(0), data(1))
                }).map(y => println(y));
      

      【讨论】:

        猜你喜欢
        • 2021-12-07
        • 2018-04-26
        • 1970-01-01
        • 2015-12-04
        • 1970-01-01
        • 2021-06-18
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多