【问题标题】:Conversion from nested json to csv with pandas使用 pandas 从嵌套 json 转换为 csv
【发布时间】:2019-11-14 23:05:47
【问题描述】:

我正在尝试将嵌套的 json 转换为 csv 文件,但我正在为我的文件结构所需的逻辑而苦苦挣扎:它是一个包含 2 个对象的 json,我只想将其中一个对象转换为 csv,这是一个带有嵌套的列表。

我在this blog post 中发现了非常有用的“扁平化”json 信息。我基本上已经根据我的问题调整了它,但它仍然不适合我。

我的 json 文件如下所示:

{
  "tickets":[
    {
      "Name": "Liam",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Piano",
        "Sports"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "barkele01",
      "salary" : 870000
    },
    {
      "Name": "John",
      "Location": {
        "City": "Los Angeles",
        "State": "CA"
      },
      "hobbies": [
        "Music",
        "Running"
      ],
      "year" : 1985,
      "teamId" : "ATL",
      "playerId" : "bedrost01",
      "salary" : 550000
    }
  ],
  "count": 2
}

到目前为止,我的代码如下所示:

import json
from pandas.io.json import json_normalize
import argparse


def flatten_json(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
    flatten(y)
    return out


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Converting json files into csv for Tableau processing')
    parser.add_argument(
        "-j", "--json", dest="json_file", help="PATH/TO/json file to convert", metavar="FILE", required=True)

    args = parser.parse_args()

    with open(args.json_file, "r") as inputFile:  # open json file
        json_data = json.loads(inputFile.read())  # load json content
    flat_json = flatten_json(json_data)
    # normalizing flat json
    final_data = json_normalize(flat_json)

    with open(args.json_file.replace(".json", ".csv"), "w") as outputFile:  # open csv file

        # saving DataFrame to csv
        final_data.to_csv(outputFile, encoding='utf8', index=False)

我想获得的是 csv 中每张票 1 行,标题:

Name,Location_City,Location_State,Hobbies_0,Hobbies_1,Year,TeamId,PlayerId,Salary.

我真的很感激任何可以点击的东西! 谢谢!

【问题讨论】:

    标签: python json pandas csv nested


    【解决方案1】:

    实际上我最近写了一个名为cherrypicker的包来处理这种确切的事情,因为我不得不经常这样做!

    我认为下面的代码会给你你所追求的东西:

    from cherrypicker import CherryPicker
    import json
    import pandas as pd
    
    with open('file.json') as file:
        data = json.load(file)
    
    picker = CherryPicker(data)
    flat = picker['tickets'].flatten().get()
    df = pd.DataFrame(flat)
    print(df)
    

    这给了我输出:

      Location_City Location_State  Name hobbies_0 hobbies_1   playerId  salary teamId  year
    0   Los Angeles             CA  Liam     Piano    Sports  barkele01  870000    ATL  1985
    1   Los Angeles             CA  John     Music   Running  bedrost01  550000    ATL  1985
    

    您可以使用以下方式安装软件包:

    pip install cherrypicker
    

    ...https://cherrypicker.readthedocs.io 上还有更多文档和指导。

    【讨论】:

      【解决方案2】:

      如果你已经有一个扁平化 Json 对象的功能,你只需扁平化门票:

      ...
      with open(args.json_file, "r") as inputFile:  # open json file
          json_data = json.loads(inputFile.read())  # load json content
      final_data = pd.DataFrame([flatten_json(elt) for elt in json_data['tickets']])
      ...
      

      使用您的示例数据,final_data 符合预期:

        Location_City Location_State  Name hobbies_0 hobbies_1   playerId  salary teamId  year
      0   Los Angeles             CA  Liam     Piano    Sports  barkele01  870000    ATL  1985
      1   Los Angeles             CA  John     Music   Running  bedrost01  550000    ATL  1985
      

      【讨论】:

        【解决方案3】:

        可能有一个更简单的解决方案。但这应该可行!

        import json
        import pandas as pd
        
        with open('file.json') as file:
            data = json.load(file)
        
        df = pd.DataFrame(data['tickets'])
        
        for i,item in enumerate(df['Location']):
            df['location_city'] = dict(df['Location'])[i]['City']
            df['location_state'] = dict(df['Location'])[i]['State']
        
        for i,item in enumerate(df['hobbies']):
            df['hobbies_{}'.format(i)] = dict(df['hobbies'])[i]
        
        df = df.drop({'Location','hobbies'}, axis=1)
        
        print(df)
        

        【讨论】:

        • 谢谢 - 我需要这个精确的工具来从 mongo 数据存储中提取嵌套文档。完美。
        猜你喜欢
        • 2020-07-01
        • 2021-08-26
        • 2017-06-10
        • 2021-12-01
        • 2022-01-20
        • 2019-12-20
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多