【发布时间】:2020-12-10 14:22:35
【问题描述】:
Input_dataframe
id name collection
111 aaaaa {"1":{"city":"city_1","state":"state_1","country":"country_1"},
"2":{"city":"city_2","state":"state_2","country":"country_2"},
"3":{"city":"city_3","state":"state_3","country":"country_3"}
}
222 bbbbb {"1":{"city":"city_1","state":"state_1","country":"country_1"},
"2":{"city":"city_2","state":"state_2","country":"country_2"},
"3":{"city":"city_3","state":"state_3","country":"country_3"}
}
这里
id ==> string
name ==> string
collection ==> string (string representation of JSON_data)
我想要这样的东西
输出数据帧
id name key value
111 aaaaa "1" {"city":"city_1","state":"state_1","country":"country_1"},
111 aaaaa "2" {"city":"city_2","state":"state_2","country":"country_2"},
111 aaaaa "3" {"city":"city_3","state":"state_3","country":"country_3"}
222 bbbbb "1" {"city":"city_1","state":"state_1","country":"country_1"},
222 bbbbb "2" {"city":"city_2","state":"state_2","country":"country_2"},
222 bbbbb "3" {"city":"city_3","state":"state_3","country":"country_3"}
如果我的collection 属性类型是map 或array,那么explode 函数将完成我的任务。但我有collection 作为字符串类型(JSON_data)
如何获取 output_dataframe?
请告诉我
注意 集合属性可能具有嵌套且不可预测的架构。
{
"1":{"city":"city_1","state":"state_1","country":"country_1"},
"2":{"city":"city_2","state":"state_2","country":"country_2","a":
{"aa":"111"}},
"3":{"city":"city_3","state":"state_3"}
}
【问题讨论】:
标签: python-3.x dataframe apache-spark pyspark apache-spark-sql