【问题标题】:U- SQL Unable to extract data from JSON fileU-SQL 无法从 JSON 文件中提取数据
【发布时间】:2016-03-10 10:22:59
【问题描述】:

我试图使用 USQL 从 JSON 文件中提取数据。查询成功运行而没有产生任何输出数据或导致“顶点失败快速错误”。

JSON 文件如下所示:

{
  "results": [
    {
      "name": "Sales/Account",
      "id": "7367e3f2-e1a5-11e5-80e8-0933ecd4cd8c",
      "deviceName": "HP",
      "deviceModel": "g6-pavilion",
      "clientip": "0.41.4.1"
    },
    {
      "name": "Sales/Account",
      "id": "c01efba0-e0d5-11e5-ae20-af6dc1f2c036",
      "deviceName": "acer",
      "deviceModel": "veriton",
      "clientip": "10.10.14.36"
    }
  ]
}

而我的 U-SQL 脚本是

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";

DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";

@trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@jsonify=SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(results,"name","id","deviceName","deviceModel","clientip") AS rec FROM @trail2;

@logSchema=SELECT rec["name"] AS sysName,
              rec["id"] AS sysId,
              rec["deviceName"] AS domainDeviceName,
              rec["deviceModel"] AS domainDeviceModel,
              rec["clientip"] AS domainClientIp 
       FROM @jsonify;

OUTPUT @logSchema TO @out USING Outputters.Tsv();

【问题讨论】:

    标签: azure-data-factory azure-data-lake u-sql


    【解决方案1】:

    实际上,JSONExtractor 支持以JSONPath 表示的行路径参数,这使您能够识别要映射到行的 JSON 对象或 JSON 数组项。因此,您可以使用 JSON 文档中的单个语句提取数据:

    @logSchema = 
        EXTRACT name string, id string, deviceName string, deviceModel string, clientip string
        FROM @input
       USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("results[*]");
    

    【讨论】:

      【解决方案2】:

      萨拉特,

      问题在于,据我所知,您的 @trail2 输出是 JsonFunction 无法解析的 json 数组“[{...},{...}]”。所以我把它输出到一个文件并用输入器重新读取它,它可以解析数组。

      REFERENCE ASSEMBLY [Newtonsoft.Json];
      REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
      
      DECLARE @in string="adl://xyz.azuredatalakestore.net/todelete.json";
      DECLARE @out string="adl://xyz.azuredatalakestore.net/todelete.tsv";
      DECLARE @mid string="adl://xyz.azuredatalakestore.net/intermediate.txt";
      
      
      @trail2=EXTRACT results string FROM @in USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
      
      OUTPUT @trail2 TO @mid USING Outputters.Text(quoting:false);
      
      @jsonify=EXTRACT name string,
                      id string, 
                      deviceName string ,
                      deviceModel string,
                      clientip string
      FROM @mid USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
      
      @logSchema=SELECT name AS sysName,
                    id AS sysId,
                    deviceName AS domainDeviceName,
                    deviceModel AS domainDeviceModel,
                    clientip AS domainClientIp 
             FROM @jsonify;
      
      OUTPUT @logSchema TO @out USING Outputters.Tsv();
      

      【讨论】:

      • 您可以在没有中间文件的情况下更有效地执行此操作(实际上需要您提交两个作业,因为脚本无法读取它创建的数据)。请参阅我的替代答案。
      • 脚本无法读取它创建的数据 那么他如何处理 OUTPUT 后跟 EXTRACT 对同一资源 @mid ??!?!? !
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多