【问题标题】:CSV to JSON using a custom format使用自定义格式的 CSV 到 JSON
【发布时间】:2021-10-18 10:06:18
【问题描述】:

我正在尝试使用 Pandas 从 CSV 创建 JSON 文件

CSV 文件 这只是摘录,对不起,长桌,但我想更清楚地展示一些东西。

Month Type Subtype ItemName
December ObjectTypeA SubType A1 Item 1
December ObjectTypeA SubType A1 Item 2
December ObjectTypeA SubType A2 Item 3
December ObjectTypeA SubType A2 Item 4
December ObjectTypeA SubType A2 Item 5
December ObjectTypeA SubType A3 Item 6
December ObjectTypeA SubType A3 Item 7
December ObjectTypeA SubType A4 Item 8
December ObjectTypeA SubType A4 Item 9
December ObjectTypeA SubType A4 Item 10
December ObjectTypeA SubType A4 Item 11
December ObjectTypeA SubType A4 Item 12
December ObjectTypeA SubType A5 Item 13
December ObjectTypeA SubType A5 Item 14
December ObjectTypeA SubType A5 Item 15
December ObjectTypeB SubType B1 Item 16
December ObjectTypeB SubType B1 Item 17
December ObjectTypeB SubType B2 Item 18
December ObjectTypeB SubType B2 Item 19
December ObjectTypeB SubType B2 Item 20
December ObjectTypeB SubType B3 Item 21
December ObjectTypeB SubType B3 Item 22
March ObjectTypeA SubType A1 Item 23
March ObjectTypeA SubType A1 Item 24
March ObjectTypeA SubType A2 Item 25
March ObjectTypeA SubType A2 Item 26
March ObjectTypeA SubType A2 Item 27
March ObjectTypeA SubType A3 Item 28
March ObjectTypeA SubType A3 Item 29
March ObjectTypeA SubType A4 Item 30
March ObjectTypeA SubType A4 Item 31
March ObjectTypeA SubType A4 Item 32
March ObjectTypeA SubType A4 Item 33
March ObjectTypeA SubType A4 Item 34
March ObjectTypeC SubType C1 Item 35
March ObjectTypeC SubType C1 Item 36
March ObjectTypeC SubType C2 Item 37
March ObjectTypeC SubType C2 Item 38
March ObjectTypeC SubType C3 Item 39

所需输出

allobjects: {
"December": {
    "Object Type A": {
        "Subtype A1": ["Item1","Item2"],
        "Subtype A2": ["Item3","Item4","Item5"],
        "Subtype A3": ["Item6","Item7"],
        "Subtype A4": ["Item8","Item9"],
        "Subtype A5": ["Item10","Item11","Item12"]
        },
                
    "Object Type B": {
        "Subtype B1": ["Item13","Item14"],
        "Subtype B2": ["Item16","Item15","Item17","Item18"],
        "Subtype B3": ["Item19","Item20"],
        "Subtype B4": ["Item21","Item22"],
        "Subtype B5": ["Item23","Item24","Item25"]
        },
    "Object Type C": {
        "Subtype C1": ["Item26", "Item27"],
        "Subtype C2": ["Item28", "Item29"],
        "Subtype C3": ["Item30", "Item31"]
        }},
"March": {
    "Object Type A": {
        "Subtype A1": ["Item32","Item33"],
        "Subtype A2": ["Item34","Item35"],
        "Subtype A3": ["Item36","Item37"],
        "Subtype A4": ["Item38","Item39","Item40"],
        "Subtype A5": ["Item41","Item42","Item44"]
        },
                
    "Object Type C": {
        "Subtype C1": ["Item45", "Item46"],
        "Subtype C2": ["Item47", "Item48"],
        "Subtype C3": ["Item49", "Ite50"]
        }},
    },

当前代码

df = pd.read_csv("Book4.csv", dtype={
            "Month" : str,
            "Type" : str,
            "Subtype" : str,
            "ItemName": str,
        })


compiled = []

for (month, type, subtype), bag in df.groupby(["Month", "Type", "Subtype"]):
    contents = bag.drop(["Month", "Type","Subtype"], axis=1)
    allitems = [list(row) for i,row in contents.items()]
    compiled.append(dict([(month, {}),
                        (type, {}),
                        (subtype, allitems),
                         ]))
with open("Book4_pandas.json", 'w') as outfile:
    outfile.write(json.dumps(compiled, sort_keys=False, indent=2, separators=(',', ': ') ))

当前代码的输出

[
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item1",
             "Item2"
           ]
    ]
  },
  {
    "December": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item3",
             "Item4",
         "Item5"
           ]
    ]
  },

.......This goes on for december and then

  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A1": [
       [ "Item23",
             "Item24"
           ]
    ]
  },
  {
    "March": {},
    "ObjectTypeA": {},
    "Subtype A2": [
       [ "Item25",
             "Item26",
         "Item27"
           ]
    ]
  }
]

我很欣赏 JSON 格式是非标准的;但是,我认为编写 dict 将是一种“简单”的方法?我认为 for 循环的结构方式存在错误?

提前非常感谢!

【问题讨论】:

    标签: json pandas csv


    【解决方案1】:

    您可以首先通过聚合创建由lists 填充的系列,然后在嵌套的字典理解中创建预期的输出:

    s = df.groupby(["Month", "Type", "SubType"], sort=False)['ItemName'].agg(list)
    
    compiled = {i: {j[1]: h[j].to_dict() 
                    for j, h in g.groupby(level=[0,1], sort=False)}
                    for i, g in s.groupby(level=0, sort=False)}
    

    print (compiled)
    
    {
        'December': {
            'ObjectTypeA': {
                'SubType A1': ['Item 1', 'Item 2'],
                'SubType A2': ['Item 3', 'Item 4', 'Item 5'],
                'SubType A3': ['Item 6', 'Item 7'],
                'SubType A4': ['Item 8', 'Item 9', 'Item 10', 'Item 11', 'Item 12'],
                'SubType A5': ['Item 13', 'Item 14', 'Item 15']
            },
            'ObjectTypeB': {
                'SubType B1': ['Item 16', 'Item 17'],
                'SubType B2': ['Item 18', 'Item 19', 'Item 20'],
                'SubType B3': ['Item 21', 'Item 22']
            }
        },
        'March': {
            'ObjectTypeA': {
                'SubType A1': ['Item 23', 'Item 24'],
                'SubType A2': ['Item 25', 'Item 26', 'Item 27'],
                'SubType A3': ['Item 28', 'Item 29'],
                'SubType A4': ['Item 30', 'Item 31', 'Item 32', 'Item 33', 'Item 34']
            },
            'ObjectTypeC': {
                'SubType C1': ['Item 35', 'Item 36'],
                'SubType C2': ['Item 37', 'Item 38'],
                'SubType C3': ['Item 39']
            }
        }
    }
        
    

    with open("Book4_pandas.json", 'w') as outfile:
        outfile.write(json.dumps(compiled, sort_keys=False,
                                 indent=2, separators=(',', ': ')))
    

    【讨论】:

      【解决方案2】:

      感谢您的提问。您可以像下面这样更改代码:

      import pandas as pd
      import json
      
      df = pd.read_csv("Book4.csv", dtype={
          "Month": str,
          "Type": str,
          "Subtype": str,
          "ItemName": str,
      })
      
      
      compiled = []
      
      s = df.groupby(["Month", "Type", "Subtype"])['ItemName'].agg(list)
      
      compiled = {level: {le: s.xs((level, le), level=[0, 1]).to_dict()
                          for le in s.index.levels[1]}
                  for level in s.index.levels[0]}
      
      with open("Book4_pandas.json", 'w') as outfile:
          outfile.write(json.dumps(compiled, sort_keys=False,
                                   indent=2, separators=(',', ': ')))
      

      【讨论】:

      • 我从我的答案check 中删除了这个解决方案,因为输出错误。如果还复制了 json 输出,您可以看到它。
      猜你喜欢
      • 1970-01-01
      • 2021-08-12
      • 1970-01-01
      • 1970-01-01
      • 2019-04-03
      • 2019-07-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多