【问题标题】:Flatten multi-nested JSON and export to CSV in Pandas在 Pandas 中展平多嵌套 JSON 并导出为 CSV
【发布时间】:2019-07-29 14:53:44
【问题描述】:

原始JSON文件如下:

data = [
    {
        "masterName": "AAAAAAAAAAA",
        "mainNames": [
            {
                "numbers": [
                    {
                        "date": "2019-05-16T00:00:00Z",
                        "NumberOne": 402.0,
                        "NumberTwo": 7830.0
                    }
                ],
                "name": "randomca"
            },
            {
                "numbers": [
                    {
                        "date": "2019-05-16T00:00:00Z",
                        "NumberOne": 222.0,
                        "NumberTwo": 4015.31
                    },
                    {
                        "date": "2019-05-31T00:00:00Z",
                        "NumberOne": 192.0,
                        "NumberTwo": 3685.64
                    }
                ],
                "name": "randomka"
            },
            {
                "numbers": [],
                "name": "randomop"
            }
        ]
    },
    {
        "masterName": "BBBBB",
        "mainNames": [
            {
                "numbers": [],
                "name": "randomha"
            },
            {
                "numbers": [
                    {
                        "date": "2019-05-17T00:00:00Z",
                        "NumberOne": 31.0,
                        "NumberTwo": 1500.0
                    },
                    {
                        "date": "2019-05-31T00:00:00Z",
                        "NumberOne": 236.0,
                        "NumberTwo": 31819.96
                    }
                ],
                "name": "randomba"
            }
        ]
    }
]

使用我的代码,结果是:

我的代码如下:

test_data = {
"main": []
}

for item in range(len(data)):
  test_data['main'].append(data[item])

df = pd.DataFrame(test_data)

df = pd.concat(
    [
        pd.concat([pd.Series(m) for m in t['mainNames']], axis=1) for t in test_data['main']
    ], keys=[t['masterName'] for t in test_data['main']]
)

df.index.levels[0].name = 'masterName'
df.columns.name = 'member'

df2 = df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()

df2.to_csv('stack.csv', sep=',', encoding='utf-8', index=False)

预期的结果是:

PS:我已经从 csv 输出屏幕截图中排除了成员,因为我不再需要它了。

【问题讨论】:

    标签: python pandas export-to-csv


    【解决方案1】:

    Python 聚合的短方法:

    records = ({'teamname': d['masterName'], 'name': name['name'], **num_dct} 
               for d in data
               for name in d['mainNames'] for num_dct in name['numbers'] or [{}])
    
    df = pd.DataFrame(records)
    cols = ['teamname', 'name', 'date']
    print(df[cols + df.columns[~df.columns.isin(cols)].tolist()])
    

    输出:

          teamname      name                  date  NumberOne  NumberTwo
    0  AAAAAAAAAAA  randomca  2019-05-16T00:00:00Z      402.0    7830.00
    1  AAAAAAAAAAA  randomka  2019-05-16T00:00:00Z      222.0    4015.31
    2  AAAAAAAAAAA  randomka  2019-05-31T00:00:00Z      192.0    3685.64
    3  AAAAAAAAAAA  randomop                   NaN        NaN        NaN
    4        BBBBB  randomha                   NaN        NaN        NaN
    5        BBBBB  randomba  2019-05-17T00:00:00Z       31.0    1500.00
    6        BBBBB  randomba  2019-05-31T00:00:00Z      236.0   31819.96
    

    【讨论】:

      猜你喜欢
      • 2022-11-23
      • 2021-06-28
      • 2021-10-09
      • 2020-01-21
      • 1970-01-01
      • 2023-03-07
      • 1970-01-01
      • 2021-12-16
      • 2021-12-03
      相关资源
      最近更新 更多