【问题标题】:Pivoting a single row Spark dataframe with pivot使用枢轴旋转单行 Spark 数据框
【发布时间】:2020-08-08 15:25:57
【问题描述】:

我是 spark 新手,我想使用 scala 将数据框的单行旋转如下:

+--------------+-------+-------+-------+-------+-------+-------+-------+
|       Country| 3/7/20| 3/8/20| 3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
+--------------+-------+-------+-------+-------+-------+-------+-------+
|         Japan|      0|      4|     10|     18|     27|     31|     35|
+--------------+-------+-------+-------+-------+-------+-------+-------+

我的透视数据框应如下所示

+--------------+-------+
|       Country| Japan |
+--------------+-------+
|        3/7/20|      0|
+--------------+-------+ 
|        3/8/20|      4|
+--------------+-------+
|        3/9/20|     10|
+--------------+-------+
|       3/10/20|     18|
+--------------+-------+
|           ...|    ...|
+--------------+-------+

我尝试过使用以下方法,但不确定是否正确获取了聚合表达式:

val pivoted = df.groupBy("Country").pivot("Country", Seq("Japan")).agg(col("Country"))

【问题讨论】:

    标签: scala apache-spark apache-spark-sql


    【解决方案1】:

    试试这个-

    使用stack

     df2.show(false)
        df2.printSchema()
        /**
          * +-------+------+------+------+-------+-------+-------+-------+
          * |Country|3/7/20|3/8/20|3/9/20|3/10/20|3/11/20|3/12/20|3/13/20|
          * +-------+------+------+------+-------+-------+-------+-------+
          * |Japan  |0     |4     |10    |18     |27     |31     |35     |
          * +-------+------+------+------+-------+-------+-------+-------+
          *
          * root
          * |-- Country: string (nullable = true)
          * |-- 3/7/20: integer (nullable = true)
          * |-- 3/8/20: integer (nullable = true)
          * |-- 3/9/20: integer (nullable = true)
          * |-- 3/10/20: integer (nullable = true)
          * |-- 3/11/20: integer (nullable = true)
          * |-- 3/12/20: integer (nullable = true)
          * |-- 3/13/20: integer (nullable = true)
          */
        val stringCol = df2.columns.map(c => s"'$c', cast(`$c` as string)").mkString(", ")
        val processedDF = df2.selectExpr(s"stack(${df2.columns.length}, $stringCol) as (col_1, col_2)")
        processedDF.show(false)
        /**
          * +-------+-----+
          * |col_1  |col_2|
          * +-------+-----+
          * |Country|Japan|
          * |3/7/20 |0    |
          * |3/8/20 |4    |
          * |3/9/20 |10   |
          * |3/10/20|18   |
          * |3/11/20|27   |
          * |3/12/20|31   |
          * |3/13/20|35   |
          * +-------+-----+
          */
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-10
      • 2014-07-08
      • 2018-08-03
      • 2015-07-14
      相关资源
      最近更新 更多