【问题标题】:How to read from a csv file to create a scala Map object?如何从 csv 文件中读取以创建 scala Map 对象?
【发布时间】:2019-11-13 14:34:41
【问题描述】:

我有一个我想从中读取的 csv 的路径。此 csv 包括三列:“主题、键、值”我正在使用 spark 将此文件作为 csv 文件读取。该文件如下所示(lookupFile.csv):

Topic,Key,Value
fruit,aaa,apple
fruit,bbb,orange
animal,ccc,cat
animal,ddd,dog

//I'm reading the file as follows
val lookup = SparkSession.read.option("delimeter", ",").option("header", "true").csv(lookupFile)

我想获取我刚刚阅读的内容并返回具有以下属性的地图:

  • 地图使用主题作为键
  • 此映射的值是“Key”和“Value”列的映射

我希望我能得到如下所示的地图:

val result = Map("fruit" -> Map("aaa" -> "apple", "bbb" -> "orange"),
                 "animal" -> Map("ccc" -> "cat", "ddd" -> "dog"))

关于如何做到这一点的任何想法?

【问题讨论】:

    标签: scala apache-spark rdd


    【解决方案1】:
    scala> val in = spark.read.option("header", true).option("inferSchema", true).csv("""Topic,Key,Value
         | fruit,aaa,apple
         | fruit,bbb,orange
         | animal,ccc,cat
         | animal,ddd,dog""".split("\n").toSeq.toDS)
    in: org.apache.spark.sql.DataFrame = [Topic: string, Key: string ... 1 more field]
    
    scala> val res = in.groupBy('Topic).agg(map_from_entries(collect_list(struct('Key, 'Value))).as("subMap"))
    res: org.apache.spark.sql.DataFrame = [Topic: string, subMap: map<string,string>]
    
    scala> val scalaMap = res.collect.map{
         | case org.apache.spark.sql.Row(k : String, v : Map[String, String]) => (k, v) 
         | }.toMap
    <console>:26: warning: non-variable type argument String in type pattern scala.collection.immutable.Map[String,String] (the underlying of Map[String,String]) is unchecked since it is eliminated by erasure
           case org.apache.spark.sql.Row(k : String, v : Map[String, String]) => (k, v)
                                                         ^
    scalaMap: scala.collection.immutable.Map[String,Map[String,String]] = Map(animal -> Map(ccc -> cat, ddd -> dog), fruit -> Map(aaa -> apple, bbb -> orange))
    

    【讨论】:

      【解决方案2】:

      读入你的数据

      val df1= spark.read.format("csv").option("inferSchema", "true").option("header", "true").load(path)
      

      首先将“key,value”放入数组和groupBy主题中以获取您的目标 分为关键部分和价值部分。

      val df2= df.groupBy("Topic").agg(collect_list(array($"Key",$"Value")).as("arr"))
      

      现在转换为数据集

      val ds= df2.as[(String,Seq[Seq[String]])]
      

      在字段上应用逻辑以获取您的地图并收集

      val ds1 =ds.map(x=> (x._1,x._2.map(y=> (y(0),y(1))).toMap)).collect
      

      现在您的数据已设置为主题作为键,“键,值”作为值,所以现在应用 Map 来获取结果

      ds1.toMap
      
      Map(animal -> Map(ccc -> cat, ddd -> dog), fruit -> Map(aaa -> apple, bbb -> orange))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2021-02-05
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2015-09-01
        相关资源
        最近更新 更多