【问题标题】:Writing Nested JSON Dictionary List To CSV将嵌套的 JSON 字典列表写入 CSV
【发布时间】:2021-08-15 17:11:05
【问题描述】:

问题

我正在尝试编写以下嵌套字典列表,其中包含另一个 csv 字典列表。我尝试了多种方法,但我无法正确编写它:

Json 数据

[
    {
        "Basic_Information_Source": [
            {
                "Image": "image1.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277274
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image1_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277539
            }
        ],
        "Values": [
            {
                "Value1": 75.05045463635267,
                "Value2": 0.006097560975609756,
                "Value3": 0.045083481733371615,
                "Value4": 0.008639858263904898
            }
        ]
    },
    {
        "Basic_Information_Source": [
            {
                "Image": "image2.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1786254
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image2_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1782197
            }
        ],
        "Values": [
            {
                "Value1": 85.52662890580055,
                "Value2": 0.0005464352720450282,
                "Value3": 0.013496113910369758,
                "Value4": 0.003800236380811839
            }
        ]
    }
]

工作代码

我尝试使用以下代码,它可以工作,但它只保存了标题,然后将所有基础列表作为文本转储到 csv 文件中:

import json
import csv

def Convert_CSV():

    ar_enc_file = open('analysis_results_enc.json','r')
    json_data = json.load(ar_enc_file)

    keys = json_data[0].keys()
    
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(json_data)

    ar_enc_file.close()

Convert_CSV()

工作输出/问题

输出写入以下标头:

  • Basic_Information_Source
  • Basic_Information_Destination
  • 价值观

然后它将每个标头中的所有其他数据转储为如下列表:

[{'Image': 'image1.png', 'Image_Format': 'PNG', 'Image_Mode': 'RGB', 'Image_Width': 574, 'Image_Height': 262, 'Image_Size': 277274}]

预期输出/样本

尝试为字典数组中的每个字典生成上述类型的输出。

如何正确写法?

【问题讨论】:

  • 你的首选输出是什么?
  • 您想完全展平输出吗?如果是这样,你想如何表示嵌套列表?我们真的需要看到您希望的输出才能提供帮助。
  • @Axe319 谢谢,我已经添加了预期的输出格式。我正在尝试为数组中的每个 dict 生成类似的格式。
  • @JonSG 谢谢,我已经添加了预期的输出格式。我正在尝试为数组中的每个 dict 生成类似的格式。

标签: json python-3.x csv


【解决方案1】:

我相信有人会提供更优雅的解决方案。话虽这么说:

你有一些问题。

  • 您的条目与要对齐的字段不一致。
  • 即使您填充数据,您也有中间 lists 需要展平。
  • 那么你还有分离的数据需要合并在一起。
  • DictWriterAFAIK 期望它的数据格式为 [{'column': 'entry'},{'column': 'entry'},因此即使您执行了前面的所有步骤,您的格式仍然不正确。

让我们开始吧。

对于前两个部分,我们可以合并。

def pad_list(lst, size, padding=None):
    # we wouldn't have to make a copy but I prefer to
    # avoid the possibility of getting bitten by mutability
    _lst = lst[:]
    for _ in range(len(lst), size):
        _lst.append(padding)
    return _lst


# this expects already parsed json data
def flatten(json_data):
    lst = []
    for dct in json_data:
        # here we're just setting a max size of all dict entries
        # this is in case the shorter entry is in the first iteration
        max_size = 0
        # we initialize a dict for each of the list entries
        # this is in case you have inconsistent lengths between lists
        flattened = dict()
        for k, v in dct.items():
            entries = list(next(iter(v), dict()).values())
            flattened[k] = entries
            max_size = max(len(entries), max_size)
        # here we append the padded version of the keys for the dict
        lst.append({k: pad_list(v, max_size) for k, v in flattened.items()})
    return lst

所以现在我们有一个扁平化的dicts 列表,其值为lists,长度一致。本质上:

[
    {
        "Basic_Information_Source": [
            "image1.png",
            "PNG",
            "RGB",
            574,
            262,
            277274
        ],
        "Basic_Information_Destination": [
            "image1_dst.png",
            "PNG",
            "RGB",
            574,
            262,
            277539
        ],
        "Values": [
            75.05045463635267,
            0.006097560975609756,
            0.045083481733371615,
            0.008639858263904898,
            None,
            None
        ]
    }
]

但是这个list 有多个dicts 需要合并,而不仅仅是一个。

所以我们需要合并。

# this should be self explanatory
def merge(flattened):
    merged = dict()
    for dct in flattened:
        for k, v in dct.items():
            if k not in merged:
                merged[k] = []
            merged[k].extend(v)
    return merged

这给了我们一些接近于这个的东西:

{
    "Basic_Information_Source": [
        "image1.png",
        "PNG",
        "RGB",
        574,
        262,
        277274,
        "image2.png",
        "PNG",
        "RGB",
        1600,
        1066,
        1786254
    ],
    "Basic_Information_Destination": [
        "image1_dst.png",
        "PNG",
        "RGB",
        574,
        262,
        277539,
        "image2_dst.png",
        "PNG",
        "RGB",
        1600,
        1066,
        1782197
    ],
    "Values": [
        75.05045463635267,
        0.006097560975609756,
        0.045083481733371615,
        0.008639858263904898,
        None,
        None,
        85.52662890580055,
        0.0005464352720450282,
        0.013496113910369758,
        0.003800236380811839,
        None,
        None
    ]
}

但是等等,我们仍然需要为作者格式化它。

我们的数据需要是[{'column_1': 'entry', column_2: 'entry'},{'column_1': 'entry', column_2: 'entry'}的格式

所以我们格式化:

def format_for_writer(merged):
    formatted = []
    for k, v in merged.items():
        for i, item in enumerate(v):
            # on the first pass this will append an empty dict
            # on subsequent passes it will be ignored
            # and add keys into the existing dict
            if i >= len(formatted):
                formatted.append(dict())
            formatted[i][k] = item
    return formatted

所以最后,我们有了一个格式清晰的数据结构,我们可以将其交给我们的 writer 函数。

def convert_csv(formatted):
    keys = formatted[0].keys()
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(formatted)

带有json字符串的完整代码:

import json
import csv

json_raw = """\
[
    {
        "Basic_Information_Source": [
            {
                "Image": "image1.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277274
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image1_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277539
            }
        ],
        "Values": [
            {
                "Value1": 75.05045463635267,
                "Value2": 0.006097560975609756,
                "Value3": 0.045083481733371615,
                "Value4": 0.008639858263904898
            }
        ]
    },
    {
        "Basic_Information_Source": [
            {
                "Image": "image2.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1786254
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image2_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1782197
            }
        ],
        "Values": [
            {
                "Value1": 85.52662890580055,
                "Value2": 0.0005464352720450282,
                "Value3": 0.013496113910369758,
                "Value4": 0.003800236380811839
            }
        ]
    }
]
"""


def pad_list(lst, size, padding=None):
    _lst = lst[:]
    for _ in range(len(lst), size):
        _lst.append(padding)
    return _lst


def flatten(json_data):
    lst = []
    for dct in json_data:
        max_size = 0
        flattened = dict()
        for k, v in dct.items():
            entries = list(next(iter(v), dict()).values())
            flattened[k] = entries
            max_size = max(len(entries), max_size)
        lst.append({k: pad_list(v, max_size) for k, v in flattened.items()})
    return lst


def merge(flattened):
    merged = dict()
    for dct in flattened:
        for k, v in dct.items():
            if k not in merged:
                merged[k] = []
            merged[k].extend(v)
    return merged


def format_for_writer(merged):
    formatted = []
    for k, v in merged.items():
        for i, item in enumerate(v):
            if i >= len(formatted):
                formatted.append(dict())
            formatted[i][k] = item
    return formatted


def convert_csv(formatted):
    keys = formatted[0].keys()
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(formatted)


def main():
    json_data = json.loads(json_raw)
    flattened = flatten(json_data)
    merged = merge(flattened)
    formatted = format_for_writer(merged)
    convert_csv(formatted)


if __name__ == '__main__':
    main()

【讨论】:

    猜你喜欢
    • 2016-12-12
    • 2018-10-27
    • 2021-04-23
    • 2015-06-06
    • 1970-01-01
    • 2013-11-23
    • 2014-02-28
    • 2020-06-09
    • 2018-08-10
    相关资源
    最近更新 更多