【发布时间】:2018-09-12 11:07:24
【问题描述】:
我有大量的嵌套 JSON,有超过 200 个键要转换并存储在结构表中。
|-- ip_address: string (nullable = true)
|-- xs_latitude: double (nullable = true)
|-- Applications: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- b_als_o_isehp: string (nullable = true)
| | |-- b_als_p_isehp: string (nullable = true)
| | |-- b_als_s_isehp: string (nullable = true)
| | |-- l_als_o_eventid: string (nullable = true)
....
读取 JSON 并获取每个 ip_address 具有一个应用程序数组数据
{"ip_address": 1512199720,"Applications": [{"s_pd": -1,"s_path": "NA", "p_pd": "temp0"}, {"s_pd": -1,"s_path": "root/hdfs", "p_pd": "temp1"},{"s_pd": -1,"s_path": "root/hdfs", "p_pd": "temp2"}],}
val data = spark.read.json("file:///root/users/data/s_json.json")
var appDf = data.withColumn("data",explode($"Applications")).select($"Applications.s_pd", $"Applications.s_path", $"Applications.p_pd", $"ip_address")
appDf.printSchema
/// gives
root
|-- s_pd: array (nullable = true)
| |-- element: string (containsNull = true)
|-- s_path: array (nullable = true)
| |-- element: string (containsNull = true)
|-- p_pd: array (nullable = true)
| |-- element: string (containsNull = true)
|-- ip_address: string (nullable = true)
【问题讨论】:
-
在我的头顶上,您可以尝试
appDf.select("ip_addres", "xs_latitude", "Applications.*")来平整这样的结构。还是任意深度嵌套?
标签: json scala apache-spark dataframe