【问题标题】:How to perform pivot on spark dataframe using spark-scala?如何使用 spark-scala 在 spark 数据帧上执行枢轴?
【发布时间】:2021-08-05 05:55:35
【问题描述】:

下面是输入数据框。

+----------+------------+
|sensortype|sensorstatus|
+----------+------------+
|Sensor1   |offline     |
|Sensor1   |offline     |
|Sensor1   |online      |
|Sensor2   |offline     |
|Sensor2   |offline     |
|Sensor3   |online      |
+----------+------------+

我想把上面的dataFrame改成如下图,

+----------+-------+------+-----+
|sensortype|offline|online|total|
+----------+-------+------+-----+
|Sensor2   |2      |0     |2    |
|Sensor3   |0      |1     |1    |
|Sensor1   |2      |1     |3    |
+----------+-------+------+-----+

【问题讨论】:

    标签: scala dataframe apache-spark apache-spark-sql


    【解决方案1】:

    我们可以使用 spark-scala 的数据透视和聚合函数获得所需的结果。

      val spark = SparkSession.builder().master("local[*]").getOrCreate()
      import spark.implicits._
      spark.sparkContext.setLogLevel("ERROR")
      // Sample dataframe
      val df = Seq(
        ("Sensor1", "offline"),
        ("Sensor1", "offline"),
        ("Sensor1", "online"),
        ("Sensor2", "offline"),
        ("Sensor2", "offline"),
        ("Sensor3", "online")
      ).toDF("sensortype", "sensorstatus")
    
      df.groupBy("sensortype")
        .pivot("sensorstatus").agg(count("sensorstatus"))
        // Replace null by 0 in columns - offline and online
        .na.fill(0, Seq("offline", "online"))
        .withColumn("total", 'offline + 'online)
        .show(false)
    
    +----------+-------+------+-----+
    |sensortype|offline|online|total|
    +----------+-------+------+-----+
    |Sensor2   |2      |0     |2    |
    |Sensor3   |0      |1     |1    |
    |Sensor1   |2      |1     |3    |
    +----------+-------+------+-----+
    

    【讨论】:

    • 这一行出现错误 -> ".withColumn("total", 'offline + 'online)"
    猜你喜欢
    • 1970-01-01
    • 2019-01-20
    • 1970-01-01
    • 2017-08-11
    • 1970-01-01
    • 2016-12-27
    • 2017-09-16
    • 2015-12-03
    • 2018-09-17
    相关资源
    最近更新 更多