【问题标题】:How to parse a JSON object into smaller objects using Python?如何使用 Python 将 JSON 对象解析为更小的对象?
【发布时间】:2018-09-18 14:01:04
【问题描述】:

我有一个非常大的 JSON 对象,我需要将其拆分为较小的对象并将这些较小的对象写入文件。

样本数据

raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

期望的输出(在本例中,将数据分成两半)

output_file1.json = [{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202}]

output_file2.json = [{"id":"222","num":"4182","count":12}{"id":"33333","num":"5182","count":12}]

当前代码

import pandas as pd
import itertools
import json
from itertools import zip_longest


def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

    raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

#split the data into manageable chunks + write to files

for i, group in enumerate(grouper(raw, 4)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)
第一个文件“outputbatch_0.json”的

当前输出

["[", "{", "\"", "s"]

我觉得我做的比它需要的要困难得多。

【问题讨论】:

  • 您的 raw 字符串不是有效的 JSON(对象之间缺少逗号)。您的真实数据是这种情况还是只是问题中的错字?

标签: python json python-3.x


【解决方案1】:

假设 raw 应该是一个有效的 json 字符串(我包括了缺少的逗号),这是一个简单但有效的解决方案。

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)

def split_in_files(json_data, amount):
    step = len(json_data) // amount
    pos = 0
    for i in range(amount - 1):
        with open('output_file{}.json'.format(i+1), 'w') as file:
            json.dump(json_data[pos:pos+step], file)
            pos += step
    # last one
    with open('output_file{}.json'.format(amount), 'w') as file:
        json.dump(json_data[pos:], file)

split_in_files(json_data, 2)

【讨论】:

    【解决方案2】:

    如果 raw 是有效的 json。保存部分不详。

    import json
    
    raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
    
    raw_list = eval(raw)
    raw__zipped = list(zip(raw_list[0::2], raw_list[1::2]))
    
    for item in raw__zipped:
        with open('a.json', 'w') as f:
            json.dump(item, f)
    

    【讨论】:

      【解决方案3】:

      如果您需要正好一半的数据,您可以使用切片:

      import json
      
      raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
      json_data = json.loads(raw)
      
      size_of_half = len(json_data)/2
      
      print json_data[:size_of_half]
      print json_data[size_of_half:]
      

      在共享代码中,基本情况不会像长度奇数等那样处理,总之你可以做你可以用列表做的一切。

      【讨论】:

        猜你喜欢
        • 2013-01-19
        • 2017-03-18
        • 1970-01-01
        • 1970-01-01
        • 2017-09-13
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多