【发布时间】:2022-04-21 22:13:27
【问题描述】:
火花:3.0.0 斯卡拉:2.12.8
我的数据框有一列包含 JSON 字符串,我想使用 StructType 从中创建一个新列。
|temp_json_string |
+
|{"name":"test","id":"12","category":[{"products":["A","B"],"displayName":"test_1","displayLabel":"test1"},{"products":["C"],"displayName":"test_2","displayLabel":"test2"}],"createdAt":"","createdBy":""}|
+
root
|-- temp_json_string: string (nullable = true)
json 字符串看起来像
{
"name":"test",
"id":"12",
"category":[
{
"products":[
"A",
"B"
],
"displayName":"test_1",
"displayLabel":"test1"
},
{
"products":[
"C"
],
"displayName":"test_2",
"displayLabel":"test2"
}
],
"createdAt":"",
"createdBy":""
}
我想创建一个 Struct 类型的新列,所以我尝试了:
dataFrame
.withColumn("temp_json_struct", struct(col("temp_json_string")))
.select("temp_json_struct")
现在,我得到的架构为:
root
|-- temp_json_struct: struct (nullable = false)
| |-- temp_json_string: string (nullable = true)
我正在寻找的东西是:
root
|-- temp_json_struct: struct (nullable = false)
| |-- name: string (nullable = true)
|-- category: array (nullable = true)
|-- products: array (nullable = true)
|-- displayName: string (nullable = true)
|-- displayLabel: string (nullable = true)
|-- createdAt: timestamp (nullable = true)
|-- updatedAt: timestamp (nullable = true)
另外,我不知道可以在 JSON 字符串中的架构。
我已经寻找其他选项,但无法找出解决方案。
【问题讨论】:
标签: scala apache-spark apache-spark-sql