【问题标题】:Spark SQL on Nested JSON array嵌套 JSON 数组上的 Spark SQL
【发布时间】:2021-06-22 11:23:28
【问题描述】:

我有以下两条 JSON 消息

[{
  "identifier": {
    "domain": "OFFICE ADDRESS",
    "id": "987654321",
    "version": 1
  },
  "payload": {
    
    "contactMethods": [
      {
        "faxDiallingNumber": "0000/11111",
        "objId": 8,
        "type": "Fax",
        "use": "Business Address"
      },
      {
        "objId": 9,
        "telephoneDiallingNumber": "0999/99999",
        "telephoneType": "Fixed",
        "type": "Telephone",
        "use": "Business Address"
      },
      {
        "addressLine1": "house no",
        "addressLine3": "street name",
        "addressLine4": "area name",
        "cityCode": "city name",
        "countryCode": "US",
        "objId": 10,
        "postalCode": "12345",
        "preferredContactMethodFlag": true,
        "type": "International Address",
        "use": "Registered"
      }
    ]
  }
},
{
  "identifier": {
    "domain": "HOME ADDRESS",
    "id": "123456789",
    "version": 1
  },
  "payload": {
    
    "contactMethods": [
      {
        "faxDiallingNumber": "0000/22222",
        "objId": 11,
        "type": "Fax",
        "use": "home Address"
      },
      {
        "addressLine1": "house no",
        "addressLine3": "street name",
        "addressLine4": "area name",
        "cityCode": null,
        "countryCode": "US",
        "objId": 12,
        "postalCode": "45678",
        "preferredContactMethodFlag": true,
        "type": "International Address",
        "use": "Registered"
      },
      {
        "objId": 13,
        "telephoneDiallingNumber": "0999/88888,
        "telephoneType": "Fixed",
        "type": "Telephone",
        "use": "home Address"
      }
    ]
  }
}
]

使用 pyspark Spark SQL,我正在尝试类似下面的方法来找出类型为“国际地址”且 cityCode 为空的 id。

我的输出应该是

能否请您告诉我正确且有效的语法。

我尝试过explode、array_contains、flatten 函数,但没有做对。

select 
identified.id ,
payload.contactMethods.type,
payload.contactMethods.cityCode
from sample_json_df -- (will create a dataframe on the json file using pyspark)
where
payload.contactMethods.type = 'International Address'
and payload.contactMethods.cityCode is null

【问题讨论】:

    标签: json apache-spark multidimensional-array apache-spark-sql


    【解决方案1】:

    explode 应该像这样工作:

    SELECT
      id,
      contactMethods.type as type,
      contactMethods.cityCode
    FROM
      (
        SELECT
          identifier.id,
          explode(payload.contactMethods) as contactMethods
        FROM
          sample_json_df
      )
    WHERE
      contactMethods.type = 'International Address'
      AND contactMethods.cityCode is null
    

    【讨论】:

      猜你喜欢
      • 2020-11-08
      • 1970-01-01
      • 2020-01-27
      • 1970-01-01
      • 1970-01-01
      • 2017-12-03
      • 2014-10-03
      • 2019-04-11
      • 1970-01-01
      相关资源
      最近更新 更多