【问题标题】:Converting nested JSON to clear pd dataframe将嵌套 JSON 转换为清除 pd 数据框
【发布时间】:2021-12-28 14:19:18
【问题描述】:

我有一个 JSON 文件,我想将其转换为有用的 pd.DataFrame,以便我可以将其用于进一步建模。 JSON 文件如下所示:

json_file = {
  "x1": [
    {
      "a": "XZ12ABC1834",
      "b": "J. Doe",
      "c": [
        {
          "Amount": -50,
          "Date": "2021-08-15T10:00:00.000Z",
          "CategoryId": "abc123",
          "CounterParty": "The Farm",
          "Description": "some description",
          "Counter": "XYZ456AZ",
          "Type": "bc"
        },{
          "Amount": -1,
          "Date": "2020-08-15T10:00:00.000Z",
          "CategoryId": "cde123",
          "CounterParty": "The pool",
          "Description": "some other description",
          "Counter": "WYZ12",
          "Type": "X"
        }
         ]
      "a": "XX34XX872",
      "b": "J. Doe",
      "c": [
        {
          "Amount": -1,50,
          "Date": "2019-05-15T10:00:00.000Z",
          "CategoryId": "QWR627",
          "CounterParty": "The City",
          "Description": "last other description",
          "Counter": "QWE123",
          "Type": "S"
        }
      ]
    }
  ]
}

我想将此 JSON 文件转换为如下所示的数据框:

var1 a b amount date CategoryID Counterparty Description Counter Type
x1 XZ12ABC1834 J. Doe -50 2021-08-15T10:00:00.000Z abc123 The Farm some description XYZ456AZ bv
x1 XZ12ABC1834 J. Doe -1 2020-08-15T10:00:00.000Z cde123 The pool some other description WYZZ12 X
x1 XX34XX872 J. Doe -1.50 2019-05-15T10:00:00.000Z cde123 The city last other description QWE123 S

希望这些信息足以帮助我解决这个问题。

【问题讨论】:

  • 你试过pd.json_normalize吗?

标签: python json list dataframe nested


【解决方案1】:

我认为这样的事情应该可行:

import pandas as pd

result = []

for key in json_file:
  df_nested_list = pd.json_normalize(
    json_file[key], 
    record_path =['c'], 
    meta=['a', 'b']
  )
  df_nested_list['var1'] = key
  result.append(df_nested_list)
pd.concat(result)

更多信息请看:https://towardsdatascience.com/how-to-convert-json-into-a-pandas-dataframe-100b2ae1e0d8

【讨论】:

  • 为什么只取json_file['x1']的第一个元素?根据 OP 的示例,所有这些都应考虑在内。你也可以考虑把它放在一个循环中,因为 OP 可能在他们的字典中有其他 x-lists
  • @DataDude 你有多个var1 的类别吗?
  • @Tranbi 刚刚草拟了基本思路,可以编辑
猜你喜欢
  • 2019-07-08
  • 2017-03-21
  • 2020-12-16
  • 1970-01-01
  • 1970-01-01
  • 2019-11-24
  • 1970-01-01
相关资源
最近更新 更多