将嵌套 JSON 转换为 Pandas 列答案

【问题标题】：Convert Nested JSON to Pandas Column将嵌套 JSON 转换为 Pandas 列
【发布时间】：2020-03-21 08:17:54
【问题描述】：

我有一个 Json 文件如下：

[{
'instrument_token': 12335618, 'last_price': 31584.6,
'ohlc': {'open': 31080.1, 'high': 31590.0, 'low': 31049.05, 'close': 31114.7}, 
'depth': {'buy': [{'quantity': 40, 'price': 31576.4, 'orders': 1}, {'quantity': 160, 'price': 31576.0, 'orders': 1}], 'sell': [{'quantity': 200, 'price': 31584.6, 'orders': 2}, {'quantity': 60, 'price': 31584.65, 'orders': 1}]}
}]

我试过如下：

df = json_normalize(ticks)
print(df)

这给了我一个结果：

  instrument_token  last_price  ohlc.open  ohlc.high  ohlc.low  ohlc.close                                          depth.buy                                         depth.sell
0          12335618     31584.6    31080.1    31590.0  31049.05     31114.7  [{'quantity': 40, 'price': 31576.4, 'orders': ...  [{'quantity': 200, 'price': 31584.6, 'orders':...

我想进一步规范化 depth.buy & depth.sell 列在单独列中的数据，列名为：

depth.buy.quantity1, depth.buy.price1, depth.buy.orders1, 
depth.buy.quantity2, depth.buy.price2, depth.buy.orders2,
depth.sell.quantity1, depth.sell.price1, depth.sell.orders1, 
depth.sell.quantity2, depth.sell.price2, depth.sell.orders2,

是否可以进一步规范化？

【问题讨论】：

标签： json python-3.x pandas python-2.7

【解决方案1】：

对于这个示例数据集，你可以这样做。

from pandas.io.json import json_normalize

data = [{
'instrument_token': 12335618, 'last_price': 31584.6,
'ohlc': {'open': 31080.1, 'high': 31590.0, 'low': 31049.05, 'close': 31114.7}, 
'depth': {'buy': [{'quantity': 40, 'price': 31576.4, 'orders': 1}, 
                  {'quantity': 160, 'price': 31576.0, 'orders': 1}], 
          'sell': [{'quantity': 200, 'price': 31584.6, 'orders': 2}, 
                   {'quantity': 60, 'price': 31584.65, 'orders': 1}]
          }
}]

df = json_normalize(data)
cols = ["depth.buy","depth.sell"]
for c in cols:
    postdf = pd.DataFrame()
    tmp = df[c].values
    for i,val in enumerate(tmp):
        vals = list(val.values())
        keys = [f"{c}.{k}{i+1}"for k in val.keys()]
        tmpdf =pd.DataFrame([vals],columns=keys)
        postdf = pd.concat([postdf,tmpdf],axis=1)
    df = pd.concat([df,postdf],axis=1)
    df = df.drop(columns=[c])

请注意，

tmp = df[c].values 取列表的第 0 个元素。如果您有多个元素，则必须遍历元素。我假设所有数据都在一个列表中。
如果您需要动态获取列名列表（["depth.buy","depth.sell"]），您可以通过检查df 的dtypes 并获取object 类型的列名来完成.

【讨论】：