【问题标题】:Create nested json object out of a list of dicts从字典列表中创建嵌套的 json 对象
【发布时间】:2019-04-02 18:32:31
【问题描述】:

我想将字典列表翻译成嵌套的.json 文件对象。我有一个字典列表,字典中的一个字段指示特定字段是否应该嵌套在 .json 文件中的哪个位置,如果是,则指示。

我可以将内容嵌套到适当的表中,但是让它们嵌套在字段中的更下方会让我陷入循环。

我的数据采用以下格式:

table_list = [
    {"Table": "table1", "Field": "field1", "Description": "description1", "Type": "STR"}, 
    {"Table": "table1", "Field": "field2", "Description": "description2", "Type": "STR"}, 
    {"Table": "table1", "Field": "field3", "Description": "description3", "Type": "STR"},
    {"Table": "table1", "Field": "field4", "Description": "description4", "Type": "STR"},
    {"Table": "table1", "Field": "field5", "Description": "description5", "Type": "RECORD"},
    {"Table": "table1", "Field": "field5.nest1", "Description": "description6", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest2", "Description": "description7", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest3", "Description": "description8", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest4", "Description": "description9", "Type": "RECORD"},
    {"Table": "table1", "Field": "field5.nest4.nest1", "Description": "description10", "Type": "STR"},
    {"Table": "table1", "Field": "field5.nest4.nest2", "Description": "description11", "Type": "STR"},
    {"Table": "table2", "Field": "field1", "Description": "description1", "Type": "STR"}
]

我希望它输出为这种格式(抱歉有任何拼写错误):

{
    "table1": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field5",
        "Description": "description5",
        "Mode": "REPEATED",
        "Type": "RECORD",
        "Fields": [
            {
                "Field": "nest1",
                "Description": "description6",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest2",
                "Description": "description7",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest3",
                "Description": "description8",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "nest4",
                "Description": "description9",
                "Mode": "REPEATED",
                "Type": "RECORD",
                "Fields": [
                    {
                        "Field": "nest1",
                        "Description": "description10",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    },
                    {
                        "Field": "nest2",
                        "Description": "description11",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    }
                ]
            }
        ]
    }
    ]
    "table2": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    ]
}

我无法让nest1 和nest2 在现有dict 中创建一个新字段,其中包含一个可以添加到carying 深度的开放列表。本例中的巢穴只有 3 层深,但我可能需要最多达到 15 层

我的代码将在第一级使用"Table" 应用此代码,但进入字段以添加到该列表具有挑战性,我还没有找到具有完全相同问题的问题。

我看到很多人试图通过展平嵌套结构来反向执行此操作,但我正在尝试创建嵌套。

import json


def create_schema(file_to_read):
    all_tables = {}
    for row in file_to_read:
        if row['Table'] in all_tables.keys():
            all_tables[row['Table']].append({"Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description']})
        else:
            all_tables[row['Table']] = []
            all_tables[row['Table']].append({"Mode": "NULLABLE",
                                             "Field": row['Field'],
                                             "Type": row['Type'],
                                             "Description": row['Description']})
    return json.dumps(all_tables, indent=4, sort_keys=True)

我实际使用此功能得到的是:

{
    "table1": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field2",
        "Description": "description2",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field3",
        "Description": "description3",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field4",
        "Description": "description4",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "field5",
        "Description": "description5",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    },
    {
        "Field": "nest1",
        "Description": "description6",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest2",
        "Description": "description7",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest3",
        "Description": "description8",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest4",
        "Description": "description9",
        "Mode": "NULLABLE",
        "Type": "RECORD",
    },
    {
        "Field": "nest1",
        "Description": "description10",
        "Mode": "NULLABLE",
        "Type": "STR"
    },
    {
        "Field": "nest2",
        "Description": "description11",
        "Mode": "NULLABLE",
        "Type": "STR"
    }
    ]
    "table2": [
    {
        "Field": "field1",
        "Description": "description1",
        "Mode": "NULLABLE",
        "Type": "STR"
    }
    ]
}

(对于上下文,这旨在作为 BigQuery json 架构着陆)

【问题讨论】:

    标签: python json google-bigquery


    【解决方案1】:

    这应该可以实现您的目标:

    from collections import defaultdict
    
    d = defaultdict(list)
    for t in table_list:
        field_list = d[t['Table']]
        field = t['Field'].split('.')
        for f in field[:-1]:
            field_list = next(el['Fields'] for el in field_list if el['Field'] == f)
        new_d = {'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']}
        field_list.append(defaultdict(list, new_d))
    
    print(json.dumps(d, indent=4))
    

    或者如果您不想使用defaultdict

    d = {}
    for t in table_list:
        if t['Table'] not in d:
            d[t['Table']] = []
        field_list = d[t['Table']]
        field = t['Field'].split('.')
        for f in field[:-1]:
            inner = next(el for el in field_list if el['Field'] == f)
            if 'Fields' not in inner:
                inner['Fields'] = []
            field_list = inner['Fields']
        new_d = {'Field': field[-1], 'Description': t['Description'], 'Mode': 'NULLABLE' if t['Type'] == 'STR' else 'REPEATED', 'Type': t['Type']}
        field_list.append(new_d)
    

    输出:

    {
        "table1": [
            {
                "Field": "field1",
                "Description": "description1",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "field2",
                "Description": "description2",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "field3",
                "Description": "description3",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "field4",
                "Description": "description4",
                "Mode": "NULLABLE",
                "Type": "STR"
            },
            {
                "Field": "field5",
                "Description": "description5",
                "Mode": "REPEATED",
                "Type": "RECORD",
                "Fields": [
                    {
                        "Field": "nest1",
                        "Description": "description6",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    },
                    {
                        "Field": "nest2",
                        "Description": "description7",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    },
                    {
                        "Field": "nest3",
                        "Description": "description8",
                        "Mode": "NULLABLE",
                        "Type": "STR"
                    },
                    {
                        "Field": "nest4",
                        "Description": "description9",
                        "Mode": "REPEATED",
                        "Type": "RECORD",
                        "Fields": [
                            {
                                "Field": "nest1",
                                "Description": "description10",
                                "Mode": "NULLABLE",
                                "Type": "STR"
                            },
                            {
                                "Field": "nest2",
                                "Description": "description11",
                                "Mode": "NULLABLE",
                                "Type": "STR"
                            }
                        ]
                    }
                ]
            }
        ],
        "table2": [
            {
                "Field": "field1",
                "Description": "description1",
                "Mode": "NULLABLE",
                "Type": "STR"
            }
        ]
    }
    

    【讨论】:

    • 太棒了!以前从未使用过 defaultdict,但我已经爱上了它!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-04-04
    • 2021-11-20
    • 2021-04-19
    • 2018-01-16
    • 1970-01-01
    • 2018-08-08
    相关资源
    最近更新 更多