【问题标题】:Transposing DataFrame columns in Spark Scala [duplicate]在 Spark Scala 中转置 DataFrame 列 [重复]
【发布时间】:2019-02-23 03:48:59
【问题描述】:

我发现很难在 DF 中转置列。 下面给出的是基本数据框和预期输出

Student    Class         Subject      Grade    
Sam        6th Grade     Maths        A
Sam        6th Grade     Science      A
Sam        7th Grade     Maths        A-
Sam        7th Grade     Science      A
Rob        6th Grade     Maths        A
Rob        6th Grade     Science      A-
Rob        7th Grade     Maths        A-
Rob        7th Grade     Science      B
Rob        7th Grade     AP           A

预期输出:

Student Class        Math_Grade  Science_Grade  AP_Grade
Sam     6th Grade    A           A  
Sam     7th Grade    A-          A  
Rob     6th Grade    A           A- 
Rob     7th Grade    A-          B               A

请提出解决此问题的最佳方法。

【问题讨论】:

    标签: scala apache-spark dataframe apache-spark-sql transpose


    【解决方案1】:

    您可以通过Student, ClassSubject 对DataFrame 进行group,如下所示:

    import org.apache.spark.sql.functions._
    
    val df = Seq(
      ("Sam", "6th Grade", "Maths", "A"),
      ("Sam", "6th Grade", "Science", "A"),
      ("Sam", "7th Grade", "Maths", "A-"),
      ("Sam", "7th Grade", "Science", "A"),
      ("Rob", "6th Grade", "Maths", "A"),
      ("Rob", "6th Grade", "Science", "A-"),
      ("Rob", "7th Grade", "Maths", "A-"),
      ("Rob", "7th Grade", "Science", "B"),
      ("Rob", "7th Grade", "AP", "A")
    ).toDF("Student", "Class", "Subject", "Grade")
    
    df.
      groupBy("Student", "Class").pivot("Subject").agg(first("Grade")).
      orderBy("Student", "Class").
      show
    // +-------+---------+----+-----+-------+
    // |Student|    Class|  AP|Maths|Science|
    // +-------+---------+----+-----+-------+
    // |    Rob|6th Grade|null|    A|     A-|
    // |    Rob|7th Grade|   A|   A-|      B|
    // |    Sam|6th Grade|null|    A|      A|
    // |    Sam|7th Grade|null|   A-|      A|
    // +-------+---------+----+-----+-------+
    

    【讨论】:

      【解决方案2】:

      您只需使用pivot and group based on columns

       case class StudentRecord(Student: String, `Class`: String, Subject: String, Grade: String)
      
       val rows = Seq(StudentRecord
        ("Sam", "6th Grade", "Maths", "A"),
        StudentRecord
        ("Sam", "6th Grade", "Science", "A"),
        StudentRecord
        ("Sam", "7th Grade", "Maths", "A-"),
        StudentRecord
        ("Sam", "7th Grade", "Science", "A"),
        StudentRecord
        ("Rob", "6th Grade", "Maths", "A"),
        StudentRecord
        ("Rob", "6th Grade", "Science", "A-"),
        StudentRecord
        ("Rob", "7th Grade", "Maths", "A-"),
        StudentRecord
        ("Rob", "7th Grade", "Science", "B"),
        StudentRecord
        ("Rob", "7th Grade", "AP", "A")
      ).toDF()
      
       rows.groupBy("Student", "Class").pivot("Subject").agg(first("Grade")).orderBy(desc("Student"), asc("Class")).show()
      
      
       /**
        * +-------+---------+----+-----+-------+
        * |Student|    Class|  AP|Maths|Science|
        * +-------+---------+----+-----+-------+
        * |    Sam|6th Grade|null|    A|      A|
        * |    Sam|7th Grade|null|   A-|      A|
        * |    Rob|6th Grade|null|    A|     A-|
        * |    Rob|7th Grade|   A|   A-|      B|
        * +-------+---------+----+-----+-------+
        */
      

      【讨论】:

        猜你喜欢
        • 2018-10-24
        • 2019-10-22
        • 1970-01-01
        • 2021-08-30
        • 2016-06-06
        • 2016-02-22
        • 2017-02-26
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多