【问题标题】:How to explode structs array?如何爆炸结构数组?
【发布时间】:2018-12-20 14:40:38
【问题描述】:

我正在使用 JSON 对象,并希望基于 Spark SQL 数据帧/数据集将 object.hours 转换为关系表。

我尝试使用“explode”,它并不真正支持“structs array”。

json 对象如下:

{
  "business_id": "abc",
  "full_address": "random_address",
  "hours": {
    "Monday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Tuesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Friday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Wednesday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Thursday": {
      "close": "02:00",
      "open": "11:00"
    },
    "Sunday": {
      "close": "00:00",
      "open": "11:00"
    },
    "Saturday": {
      "close": "02:00",
      "open": "11:00"
    }
  }
}

到如下关系表,

CREATE TABLE "business_hours" (
     "id" integer NOT NULL PRIMARY KEY,
     "business_id" integer NOT NULL FOREIGN KEY REFERENCES "businesses",
     "day" integer NOT NULL,
     "open_time" time,
     "close_time" time
)

【问题讨论】:

    标签: apache-spark apache-spark-sql


    【解决方案1】:

    你可以用这个技巧做到这一点:

    import org.apache.spark.sql.types.StructType
    val days = df.schema 
      .fields
      .filter(_.name=="hours")
      .head
      .dataType
      .asInstanceOf[StructType]
      .fieldNames
    
    val solution = df
      .select(
        $"business_id",
        $"full_address",
        explode(
          array(
            days.map(d => struct(
              lit(d).as("day"),
              col(s"hours.$d.open").as("open_time"),
              col(s"hours.$d.close").as("close_time")
            )):_*
          )
        )
      )
      .select($"business_id",$"full_address",$"col.*")
    
    scala> solution.show
    +-----------+--------------+---------+---------+----------+
    |business_id|  full_address|      day|open_time|close_time|
    +-----------+--------------+---------+---------+----------+
    |        abc|random_address|   Friday|    11:00|     02:00|
    |        abc|random_address|   Monday|    11:00|     02:00|
    |        abc|random_address| Saturday|    11:00|     02:00|
    |        abc|random_address|   Sunday|    11:00|     00:00|
    |        abc|random_address| Thursday|    11:00|     02:00|
    |        abc|random_address|  Tuesday|    11:00|     02:00|
    |        abc|random_address|Wednesday|    11:00|     02:00|
    +-----------+--------------+---------+---------+----------+
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-01-20
      • 2019-07-24
      • 2020-06-19
      • 1970-01-01
      • 2018-04-27
      • 2022-01-24
      相关资源
      最近更新 更多