【问题标题】:How to flatten a nested JSON recursively, with flatten_json如何使用 flatten_json 递归地展平嵌套的 JSON
【发布时间】:2020-02-14 23:18:15
【问题描述】:

这个问题特定于使用来自GitHub Repo: flattenflatten_json

  • 包在pypi上flatten-json 0.1.7,可以用pip install flatten-json安装
  • 此问题特定于软件包的以下组件:
def flatten_json(nested_json: dict, exclude: list=[''], sep: str='_') -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}{sep}')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}{sep}')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

使用递归来展平嵌套的dicts

data 可以嵌套多少?:

  • flatten_json 已用于解压最终超过 100000 列的文件

扁平化的 JSON 是否可以不扁平化?:

  • 是的,这个问题不包括这个。不过如果你安装flatten这个包,有unflatten的方法,不过我没有测试过。

【问题讨论】:

    标签: python json pandas recursion


    【解决方案1】:

    如何展平JSONdict 是一个常见问题,有很多答案。

    • 此答案侧重于使用flatten_json 递归地展平嵌套的dictJSON

    假设:

    • 此答案假定您已经将JSONdict 加载到某个变量(例如文件、api 等)中
      • 在这种情况下,我们将使用data

    data是如何加载到flatten_json中的:

    • 它接受dict,如函数类型提示所示。

    data的最常见形式:

    • 只是一个字典:{}
      • flatten_json(data)
    • 字典列表:[{}, {}, {}]
      • [flatten_json(x) for x in data]
    • 带有顶级键的 JSON,其中值重复:{1: {}, 2: {}, 3: {}}
      • [flatten_json(data[key]) for key in data.keys()]
    • 其他
      • {'key': [{}, {}, {}]}: [flatten_json(x) for x in data['key']]

    实例:

    • 我通常将data 扁平化为pandas.DataFrame 以供进一步分析。
      • import pandas as pd 加载pandas
    • flatten_json 返回一个dict,可以直接使用csv 包保存。

    数据1:

    {
        "id": 1,
        "class": "c1",
        "owner": "myself",
        "metadata": {
            "m1": {
                "value": "m1_1",
                "timestamp": "d1"
            },
            "m2": {
                "value": "m1_2",
                "timestamp": "d2"
            },
            "m3": {
                "value": "m1_3",
                "timestamp": "d3"
            },
            "m4": {
                "value": "m1_4",
                "timestamp": "d4"
            }
        },
        "a1": {
            "a11": [
    
            ]
        },
        "m1": {},
        "comm1": "COMM1",
        "comm2": "COMM21529089656387",
        "share": "xxx",
        "share1": "yyy",
        "hub1": "h1",
        "hub2": "h2",
        "context": [
    
        ]
    }
    

    展平 1:

    df = pd.DataFrame([flatten_json(data)])
    
     id class   owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp  comm1               comm2 share share1 hub1 hub2
      1    c1  myself              m1_1                    d1              m1_2                    d2              m1_3                    d3              m1_4                    d4  COMM1  COMM21529089656387   xxx    yyy   h1   h2
    

    数据2:

    [{
            'accuracy': 17,
            'activity': [{
                    'activity': [{
                            'confidence': 100,
                            'type': 'STILL'
                        }
                    ],
                    'timestampMs': '1542652'
                }
            ],
            'altitude': -10,
            'latitudeE7': 3777321,
            'longitudeE7': -122423125,
            'timestampMs': '1542654',
            'verticalAccuracy': 2
        }, {
            'accuracy': 17,
            'activity': [{
                    'activity': [{
                            'confidence': 100,
                            'type': 'STILL'
                        }
                    ],
                    'timestampMs': '1542652'
                }
            ],
            'altitude': -10,
            'latitudeE7': 3777321,
            'longitudeE7': -122423125,
            'timestampMs': '1542654',
            'verticalAccuracy': 2
        }, {
            'accuracy': 17,
            'activity': [{
                    'activity': [{
                            'confidence': 100,
                            'type': 'STILL'
                        }
                    ],
                    'timestampMs': '1542652'
                }
            ],
            'altitude': -10,
            'latitudeE7': 3777321,
            'longitudeE7': -122423125,
            'timestampMs': '1542654',
            'verticalAccuracy': 2
        }
    ]
    

    展平 2:

    df = pd.DataFrame([flatten_json(x) for x in data])
    
     accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
           17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
           17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
           17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
    

    数据3:

    {
        "1": {
            "VENUE": "JOEBURG",
            "COUNTRY": "HAE",
            "ITW": "XAD",
            "RACES": {
                "1": {
                    "NO": 1,
                    "TIME": "12:35"
                },
                "2": {
                    "NO": 2,
                    "TIME": "13:10"
                },
                "3": {
                    "NO": 3,
                    "TIME": "13:40"
                },
                "4": {
                    "NO": 4,
                    "TIME": "14:10"
                },
                "5": {
                    "NO": 5,
                    "TIME": "14:55"
                },
                "6": {
                    "NO": 6,
                    "TIME": "15:30"
                },
                "7": {
                    "NO": 7,
                    "TIME": "16:05"
                },
                "8": {
                    "NO": 8,
                    "TIME": "16:40"
                }
            }
        },
        "2": {
            "VENUE": "FOOBURG",
            "COUNTRY": "ABA",
            "ITW": "XAD",
            "RACES": {
                "1": {
                    "NO": 1,
                    "TIME": "12:35"
                },
                "2": {
                    "NO": 2,
                    "TIME": "13:10"
                },
                "3": {
                    "NO": 3,
                    "TIME": "13:40"
                },
                "4": {
                    "NO": 4,
                    "TIME": "14:10"
                },
                "5": {
                    "NO": 5,
                    "TIME": "14:55"
                },
                "6": {
                    "NO": 6,
                    "TIME": "15:30"
                },
                "7": {
                    "NO": 7,
                    "TIME": "16:05"
                },
                "8": {
                    "NO": 8,
                    "TIME": "16:40"
                }
            }
        }
    }
    

    展平 3:

    df = pd.DataFrame([flatten_json(data[key]) for key in data.keys()])
    
       VENUE COUNTRY  ITW  RACES_1_NO RACES_1_TIME  RACES_2_NO RACES_2_TIME  RACES_3_NO RACES_3_TIME  RACES_4_NO RACES_4_TIME  RACES_5_NO RACES_5_TIME  RACES_6_NO RACES_6_TIME  RACES_7_NO RACES_7_TIME  RACES_8_NO RACES_8_TIME
     JOEBURG     HAE  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
     FOOBURG     ABA  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
    

    其他例子:

    1. Python Pandas - Flatten Nested JSON
    2. handling nested json in pandas
    3. How to flatten a nested JSON from the NASA Weather Insight API in Python

    【讨论】:

      猜你喜欢
      • 2020-07-28
      • 1970-01-01
      • 2015-01-28
      • 2014-08-29
      • 2015-08-15
      • 2021-07-19
      • 2018-11-13
      • 2016-09-29
      相关资源
      最近更新 更多