【问题标题】:Parsing JSON block by block逐块解析 JSON
【发布时间】:2015-07-18 20:38:43
【问题描述】:

我有一个包含客户列表和日期的 JSON 文件。

文件如下所示:

{
"Customers": [
{
  "Customer": "Customer Name Here",
  "Company": "Super Coffee",
  "First Name": "First Name Here",
  "Main Phone": "777-777-7777",
  "Fax": "777-777-7777",
  "Bill to 1": "Billing Address One",
  "Bill to 2": "Billing Address Two",
  "Bill to 3": "Billing Address Three",
  "Ship to 1": "Shipping Address One",
  "Ship to 2": "Shipping Address Two",
  "Ship to 3": "Shipping Address Three",
  "Customer Type": "Dealer/Retail"
},
{
  "Customer": "Customer Name Here",
  "Company": "Turtle Mountain Welding",
  "First Name": "First Name Here",
  "Main Phone": "777-777-7777",
  "Fax": "777-777-7777",
  "Bill to 1": "Billing Address One",
  "Bill to 2": "Billing Address Two",
  "Bill to 3": "Billing Address Three",
  "Ship to 1": "Shipping Address One",
  "Ship to 2": "Shipping Address Two",
  "Ship to 3": "Shipping Address Three",
  "Customer Type": "Dealer/Retail"
},
{
  "Customer": "Customer Name Here",
  "Company": "Mountain Equipment Coop",
  "First Name": "First Name Here",
  "Main Phone": "777-777-7777",
  "Fax": "777-777-7777",
  "Bill to 1": "Billing Address One",
  "Bill to 2": "Billing Address Two",
  "Bill to 3": "Billing Address Three",
  "Ship to 1": "Shipping Address One",
  "Ship to 2": "Shipping Address Two",
  "Ship to 3": "Shipping Address Three",
  "Customer Type": "Dealer/Retail"
},
{
  "Customer": "Customer Name Here",
  "Company": "Best Soup Inc.",
  "First Name": "First Name Here",
  "Main Phone": "777-777-7777",
  "Fax": "777-777-7777",
  "Bill to 1": "Billing Address One",
  "Bill to 2": "Billing Address Two",
  "Bill to 3": "Billing Address Three",
  "Ship to 1": "Shipping Address One",
  "Ship to 2": "Shipping Address Two",
  "Ship to 3": "Shipping Address Three",
  "Customer Type": "Dealer/Retail"
}
]
}

我需要能够逐块而不是逐行从文件中提取数据。

我习惯于逐行解析文件以获取数据,但是对于 JSON,我需要以某种方式逐块读取它(或者更准确地说,逐个对象?)。我需要为每个客户阅读括号内的内容。这样我就可以编写一个脚本来提取我需要的数据,并从中构建一个 CSV 文件。

例如:

i="1"
for file in *.json; do
     customername=$(jsonblock$i:customername);
     customerAddress=$(jsonblock$i:customeraddress);
     etc...
     i=$[i+1]
done

我明白在逐行读取文件时这是如何完成的,但是我怎样才能像读取一行一样读取每个 JSON 块呢?

【问题讨论】:

  • 你要求任何语言,只要它解析json?
  • 我建议使用令牌...你的 json 文件是偶然的链接吗?
  • 这在带有jq 的shell 脚本中很容易做到,除了jq 似乎不处理带有空格的键名。我会使用 Python 而不是用于处理 JSON 文档的 shell 脚本。
  • MAC:我不确定您所说的“您的 json 文件是链接”是什么意思。任何语言都可以。适合工作的工具就是适合工作的工具。

标签: json perl parsing data-structures perl-data-structures


【解决方案1】:

对于上面的 JSON(由于提供的数据无效而进行了修改),以下脚本将解析并打印每个块的 "Company:" 部分:

#!/usr/bin/env perl

use JSON;   
use IO::All;     
use v5.16;

my $data < io 'Our_Customers.json';
my $customers_list = decode_json($data)->{"Customers"};                

for my $customer (@$customers_list) {
   say $customer->{"Company"} ;
}

输出

Super Coffee
Turtle Mountain Welding
Mountain Equipment Coop
Best Soup Inc.

脚本使用IO::AllJSON 来读取和解析(decode_json) 文件。

在这个例子中,JSON 数据被简单地映射到一个 Perl 数据结构(Array of Hashes),它与 JSON 数据完全对应。然后我们可以访问每个数组元素(i.e array 中的每个 hash),然后通过键名访问散列中的数据。 Perl 具有非常灵活的数据处理和访问功能,这使得处理 JSON 数据变得非常愉快。

每个数据块的键来自 JSON 文件的等效部分。如果我们将一个元素从数组中移出,它将是一个哈希,我们访问可以看到元素的keysvalues,如下所示:

say for keys shift $customers_list ;

Customer Type
First Name
Bill to 2
Main Phone
...

使用您在for 循环中看到的$element-&gt;{"key"} 语法访问每个键的值。


最好在将 JSON 数据发布到 SO - JSON Lint 之前对其进行验证,类似的服务可以提供帮助。

【讨论】:

  • 我安装了 perl 模块依赖项来运行您的代码。我收到以下错误:Not an ARRAY reference at ./parse-json.pl line 10.
  • 它没有被解析的原因是因为即使 JSON 是有效的,哈希数组也必须以方括号开头。我删除了开头的大括号和标题,脚本有效。感谢您的回答,这是我第一次在现实世界中应用 perl,我迫不及待想了解更多信息!
  • 也许是for my $customer (@{$customers_list-&gt;{Customers}}) ... ?
  • ... 或my $c2 = decode_json($data)-&gt;{Customers}; for my $c (@$c2) { ... }
  • @DavidProvost - 您看到的错误是因为我正在使用修改后的 JSON 文件运行脚本。我编辑了您的帖子以强制 JSON 为“有效”。通过向后添加括号 {},您更改了生成的 perl 数据结构。正如@JJoao 所建议的那样,您只需要更改代码以适应/。我将编辑我的回复以匹配您帖子中的当前 JSON。干杯。
【解决方案2】:

如果您只是想以 CSV 格式打印 JSON 数据,那么您问错了问题。您应该解析整个 JSON 文档并逐项处理 Customers 数组。

使用 Perl 的 JSONText::CSV 模块,看起来像这样

use strict;
use warnings;

use JSON 'from_json';
use Text::CSV ();

my @columns = (
  'Bill to 1',  'Bill to 2',     'Bill to 3', 'Company',
  'Customer',   'Customer Type', 'Fax',       'First Name',
  'Main Phone', 'Ship to 1',     'Ship to 2', 'Ship to 3',
);

my $out_fh = \*STDOUT;
my $json_file = 'customers.json';

my $data = do {
  open my $fh, '<', $json_file or die qq{Unable to open "$json_file" for input: $!};
  local $/;
  from_json(<$fh>);
};
my $customers = $data->{Customers};

my $csv = Text::CSV->new({ eol => $/ });
$csv->print($out_fh, \@columns);

for my $customer ( @$customers ) {
  $csv->print($out_fh, [ @{$customer}{@columns} ]);
}

输出

"Bill to 1","Bill to 2","Bill to 3",Company,Customer,"Customer Type",Fax,"First Name","Main Phone","Ship to 1","Ship to 2","Ship to 3"
"Billing Address One","Billing Address Two","Billing Address Three","Super Coffee","Customer Name Here",Dealer/Retail,777-777-7777,"First Name Here",777-777-7777,"Shipping Address One","Shipping Address Two","Shipping Address Three"
"Billing Address One","Billing Address Two","Billing Address Three","Turtle Mountain Welding","Customer Name Here",Dealer/Retail,777-777-7777,"First Name Here",777-777-7777,"Shipping Address One","Shipping Address Two","Shipping Address Three"
"Billing Address One","Billing Address Two","Billing Address Three","Mountain Equipment Coop","Customer Name Here",Dealer/Retail,777-777-7777,"First Name Here",777-777-7777,"Shipping Address One","Shipping Address Two","Shipping Address Three"
"Billing Address One","Billing Address Two","Billing Address Three","Best Soup Inc.","Customer Name Here",Dealer/Retail,777-777-7777,"First Name Here",777-777-7777,"Shipping Address One","Shipping Address Two","Shipping Address Three"

【讨论】:

  • 我确实需要将其转换为 CSV,但我不需要每个块中的每个项目。我只需要一些物品,但所有块中的相同物品。
  • @DavidProvost:好的,但我不知道如何将其包含在我的答案中,因为我不知道您想如何指定您需要的哪些项。可能就像修改@columns数组一样简单
  • 修改@columns 数组确实会将输出过滤为特定数据。
  • 如何将$out_fh 从 STDOUT 更改为文本文件?
  • 您可以将赋值语句更改为open my $out_fh, '&gt;', 'customers.csv' or die $!;,也可以使用json_to_csv.pl &gt; customers.csv在命令行上重定向输出
【解决方案3】:

使用 perl 和 JSON 库,您可以逐步解析 JSON 列表中的每个项目,但您需要调整 json 以便它实际上不是 json,而是不以逗号分隔的 json 对象列表。

#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
use JSON;
my $json = JSON->new;
while (<>) {
    my $obj_or_undef = eval { $json->incr_parse( $_ ); };
    # Wait until its found a whole object
    if (ref $obj_or_undef) {
        say join ",", map {$obj_or_undef->{$_}} sort keys %$obj_or_undef;
    }
}

对于customers.json(不再完全是json):

{ 
    "some key" : "some value"
} {
    "other key" : "other value"
}

运行:

$ perl demo.pl < customers.json
some value
other value
$ perl demo.pl < customers.json > customer.csv

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-07-21
    • 2019-09-23
    相关资源
    最近更新 更多