【问题标题】:U-SQL - Extract data from json-arrayU-SQL - 从 json-array 中提取数据
【发布时间】:2016-07-01 14:27:03
【问题描述】:

已经尝试过建议的 JSONPath 选项,但 JSONExtractor 似乎只能识别根级别。在我的情况下,我必须处理嵌套的 json 结构,以及一个数组(参见下面的示例)。在没有多个中间文件的情况下提取此文件的任何选项?

"relation": {
"relationid": "123456",
"name": "relation1",
"addresses": {
    "address": [{
        "addressid": "1",
        "street": "Street 1",
        "postcode": "1234 AB",
        "city": "City 1"
        },
    {
        "addressid": "2",
        "street": "Street 2",
        "postcode": "5678 CD",
        "city": "City 2"
    }]
}}

选择相关 ID、地址 ID、街道、邮政编码、城市?

【问题讨论】:

  • 请发布您的代码尝试以修复。

标签: json azure-data-lake u-sql


【解决方案1】:

将您的 JSON 片段修复为:

{
    "relation": {
        "relationid": "123456",
        "name": "relation1",
        "addresses": {
            "address": [{
                "addressid": "1",
                "street": "Street 1",
                "postcode": "1234 AB",
                "city": "City 1"
            }, {
                "addressid": "2",
                "street": "Street 2",
                "postcode": "5678 CD",
                "city": "City 2"
            }]
        }
    }
}

并将其放入文件中,以下脚本将为您提供所需的内容。请注意,您需要向下导航结构以携带更高级别的项目,一旦遇到数组,如果您只需要具有数组的那些,则 CROSS APPLY EXPLODE 它,或者如果您想要缺少数组的行,则使用 OUTER APPLY EXPLODE 它.

DECLARE @input string = @"/temp/stackoverflow.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@json = 
EXTRACT relationid int, 
        name string, 
        addresses string
  FROM @input 
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("relation");

@relation =
SELECT relationid,
       name,
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(addresses)["address"] AS address_array
FROM @json;

@addresses = 
SELECT relationid, 
       name, 
       Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address) AS address
FROM @relation
     CROSS APPLY
        EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(address_array).Values) AS A(address);

@result =
SELECT relationid,
       name,
       address["addressid"]AS addressid,
       address["street"]AS street,
       address["postcode"]AS postcode,
       address["city"]AS city
FROM @addresses;

OUTPUT @result
TO "/users/temp/st_out.csv"
USING Outputters.Csv();

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-08-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-06
    相关资源
    最近更新 更多