【问题标题】:Pandas JSON normalize and merge back to the tablePandas JSON 规范化并合并回表格
【发布时间】:2021-01-04 20:46:23
【问题描述】:

我导入了 JSON 文件并将其规范化到这一点。

    Team    Goals
0   A   {'Time': '06:09', 'P': [{'Nr': 1}, {'Nr': 5}], 'Nr': 2, 'Scored': 'Game'}
0   A   {'Time': '11:07', 'P': [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}], 'Nr': 7, 'Scored': 'Game'}
1   B   {'Time': '20:04', 'P': [{'Nr': 9}, {'Nr': 2}], 'Nr': 4, 'Scored': 'Game'}
1   B   {'Time': '35:38', 'P': [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}], 'Nr': 4, 'Scored': 'Game'}
1   B   {'Time': '63:21', 'P': {'Nr': 4}, 'Nr': 9, 'Scored': 'Penalty'}

这就是我想要实现的目标:

    Team    Goals.Time  Goals.P1    Goals.P2    Goals.P3    Goals.Nr    Goals.Scored
0   A       '06:09'         1           5           NA          2           'Game'
0   A       '11:07'         2           3           1           7           'Game'
1   B       '20:04'         9           2           NA          4           'Game'
1   B       '35:38'         3           8           4           4           'Game'
1   B       '63:21'         4           NA          NA          9           'Penalty'

我尝试过 pandas json_normalize,但它忽略了 Team 列。

import pandas as pd
pd.json_normalize(df['Goals'])

   Time                                      P  Nr Scored  P.Nr
0  06:09             [{'Nr': 1}, {'Nr': 5}]  2  'Game'   NaN
1  11:07  [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}]  7  'Game'   NaN
2  20:04             [{'Nr': 9}, {'Nr': 2}]  4  'Game'   NaN
3  35:38  [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}]  4  'Game'   NaN
4  63:21                                NaN  9  'Game'  34.0

有没有办法规范化并保留 Team 列? 之后不可能合并它们,因为索引不同并且它们没有共同的列

df['Team'] 
0       A
0       A
1       B
1       B
1       B

【问题讨论】:

  • Team 列与索引对齐的事实似乎确实值得怀疑
  • pd.concat([df['Team'], pd.json_normalize(df.Goals)], axis=1)
  • 回答你的问题需要查看源JSON。

标签: python json pandas dataframe


【解决方案1】:
# save df as [dict, ... ,dict] format
data = df.to_dict(orient='record')
dfn = pd.json_normalize(data)

print(dfn)
#   Team Goals.Time                            Goals.P  Goals.Nr Goals.Scored  \
# 0    A      06:09             [{'Nr': 1}, {'Nr': 5}]         2         Game   
# 1    A      11:07  [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}]         7         Game   
# 2    B      20:04             [{'Nr': 9}, {'Nr': 2}]         4         Game   
# 3    B      35:38  [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}]         4         Game   
# 4    B      63:21                                NaN         9      Penalty   

#    Goals.P.Nr  
# 0         NaN  
# 1         NaN  
# 2         NaN  
# 3         NaN  
# 4         4.0  

# handle Goals.P is null
cond = dfn['Goals.P'].isnull()
dfn.loc[cond, 'Goals.P'] = dfn.loc[cond, 'Goals.P.Nr'].map(lambda x: [{'Nr':x}])
print(dfn)

#  Team Goals.Time                            Goals.P  Goals.Nr Goals.Scored  \
# 0    A      06:09             [{'Nr': 1}, {'Nr': 5}]         2         Game   
# 1    A      11:07  [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}]         7         Game   
# 2    B      20:04             [{'Nr': 9}, {'Nr': 2}]         4         Game   
# 3    B      35:38  [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}]         4         Game   
# 4    B      63:21                      [{'Nr': 4.0}]         9      Penalty   

#    Goals.P.Nr  
# 0         NaN  
# 1         NaN  
# 2         NaN  
# 3         NaN  
# 4         4.0


def expand_col(Ser, prefix=''):
    """pandas expand list column
    """
    obj = Ser.explode().reset_index()
    obj['cnt'] = 1
    obj['tag'] = obj.groupby('index')['cnt'].cumsum()
    df_obj = obj.set_index(['index', 'tag'])['Goals.P'].map(lambda x: x.get('Nr')).unstack()

    df_obj.columns.name = None
    df_obj.index.name = None
    df_obj.columns = prefix + df_obj.columns.astype(str)
    df_obj.index = Ser.index
    return df_obj

df_obj = expand_col(dfn['Goals.P'], prefix='Goals.P')

print(df_obj)
#   Goals.P1  Goals.P2  Goals.P3
# 0       1.0       5.0       NaN
# 1       2.0       3.0       1.0
# 2       9.0       2.0       NaN
# 3       3.0       8.0       4.0
# 4       4.0       NaN       NaN

df_result = pd.concat([dfn, df_obj], axis=1)

print(df_result)
#   Team Goals.Time                            Goals.P  Goals.Nr Goals.Scored  \
# 0    A      06:09             [{'Nr': 1}, {'Nr': 5}]         2         Game   
# 1    A      11:07  [{'Nr': 2}, {'Nr': 3}, {'Nr': 1}]         7         Game   
# 2    B      20:04             [{'Nr': 9}, {'Nr': 2}]         4         Game   
# 3    B      35:38  [{'Nr': 3}, {'Nr': 8}, {'Nr': 4}]         4         Game   
# 4    B      63:21                      [{'Nr': 4.0}]         9      Penalty   

#    Goals.P.Nr  Goals.P1  Goals.P2  Goals.P3  
# 0         NaN       1.0       5.0       NaN  
# 1         NaN       2.0       3.0       1.0  
# 2         NaN       9.0       2.0       NaN  
# 3         NaN       3.0       8.0       4.0  
# 4         4.0       4.0       NaN       NaN  

【讨论】:

    猜你喜欢
    • 2018-09-15
    • 2020-12-11
    • 2015-06-10
    • 2021-10-09
    • 1970-01-01
    • 2021-04-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多