【发布时间】:2021-06-22 11:23:28
【问题描述】:
我有以下两条 JSON 消息
[{
"identifier": {
"domain": "OFFICE ADDRESS",
"id": "987654321",
"version": 1
},
"payload": {
"contactMethods": [
{
"faxDiallingNumber": "0000/11111",
"objId": 8,
"type": "Fax",
"use": "Business Address"
},
{
"objId": 9,
"telephoneDiallingNumber": "0999/99999",
"telephoneType": "Fixed",
"type": "Telephone",
"use": "Business Address"
},
{
"addressLine1": "house no",
"addressLine3": "street name",
"addressLine4": "area name",
"cityCode": "city name",
"countryCode": "US",
"objId": 10,
"postalCode": "12345",
"preferredContactMethodFlag": true,
"type": "International Address",
"use": "Registered"
}
]
}
},
{
"identifier": {
"domain": "HOME ADDRESS",
"id": "123456789",
"version": 1
},
"payload": {
"contactMethods": [
{
"faxDiallingNumber": "0000/22222",
"objId": 11,
"type": "Fax",
"use": "home Address"
},
{
"addressLine1": "house no",
"addressLine3": "street name",
"addressLine4": "area name",
"cityCode": null,
"countryCode": "US",
"objId": 12,
"postalCode": "45678",
"preferredContactMethodFlag": true,
"type": "International Address",
"use": "Registered"
},
{
"objId": 13,
"telephoneDiallingNumber": "0999/88888,
"telephoneType": "Fixed",
"type": "Telephone",
"use": "home Address"
}
]
}
}
]
使用 pyspark Spark SQL,我正在尝试类似下面的方法来找出类型为“国际地址”且 cityCode 为空的 id。
我的输出应该是
能否请您告诉我正确且有效的语法。
我尝试过explode、array_contains、flatten 函数,但没有做对。
select
identified.id ,
payload.contactMethods.type,
payload.contactMethods.cityCode
from sample_json_df -- (will create a dataframe on the json file using pyspark)
where
payload.contactMethods.type = 'International Address'
and payload.contactMethods.cityCode is null
【问题讨论】:
标签: json apache-spark multidimensional-array apache-spark-sql