【问题标题】:for loop and adding additional columns groupby pandas dataframe in Pythonfor循环并在Python中添加额外的列groupby pandas dataframe
【发布时间】:2019-08-08 13:34:31
【问题描述】:

下面的代码是我原来的方式。

import pandas as pd
data = {'id':[1001,1001,1001,1001,1001,1001,1001,1001,1002,1002,1002,1002,1002,1002,1002,1002],
    'name':['Tom', 'Tom', 'Tom', 'Tom','Tom', 'Tom', 'Tom', 'Tom','Jack','Jack','Jack','Jack','Jack','Jack','Jack','Jack'],
    'team':['A','A', 'B', 'B', 'C','C', 'D', 'D','A','A', 'B', 'B', 'C','C', 'D', 'D',],
    'year':[2011,2011,2012,2012,2013,2013,2014,2014,2011,2011,2012,2012,2013,2013,2014,2014],
    'avg':[0.500,0.400,0.300,0.200,0.100,0.200,0.300,0.400,0.500,0.400,0.300,0.200,0.100,0.200,0.300,0.400]}

df = pd.DataFrame(data)

print (df)

team_names = [c for c in df['team'].value_counts().index]
team_names

for i in team_names:
    df[i+'_vs_avg_2011'] = df.loc[(df['team']==i)&(df['year']==2011)].groupby(['id','name'])['avg'].transform('mean')
    df[i+'_vs_avg_2012'] = df.loc[(df['team']==i)&(df['year']==2012)].groupby(['id','name'])['avg'].transform('mean')
    df[i+'_vs_avg_2013'] = df.loc[(df['team']==i)&(df['year']==2013)].groupby(['id','name'])['avg'].transform('mean')
    df[i+'_vs_avg_2014'] = df.loc[(df['team']==i)&(df['year']==2014)].groupby(['id','name'])['avg'].transform('mean')
    print(i)

对于循环部分 我试过了

years_from_to = [str(i).zfill(2) for i in range(2011,2014)]
years_from_to

for i,j in team_names, years_from_to:
    df[i+'_vs_avg_'+j] = df.loc[(df['team']==i)&(df['year']==j)].groupby(['id','name'])['avg'].transform('mean')
    print(i)

ValueError:解包的值太多(预计 2 个)

有没有办法简化或修复此代码?

【问题讨论】:

  • 什么是years_from_to
  • 抱歉我刚刚添加了
  • 你的预期输出是什么?

标签: python pandas loops dataframe for-loop


【解决方案1】:

我认为您可以使用DataFrame.pivot_table instaed 循环和MultiIndex 中的扁平列,然后将DataFrame.join 用于原始DataFrame

df1 = df.pivot_table(index=['id','name'],columns=['team','year'],values='avg', aggfunc='mean')
df1.columns = [f'{a}_vs_avg_{b}' for a, b in df1.columns]
print (df1)
           A_vs_avg_2011  B_vs_avg_2012  C_vs_avg_2013  D_vs_avg_2014
id   name                                                            
1001 Tom            0.45           0.25           0.15           0.35
1002 Jack           0.45           0.25           0.15           0.35

df = df.join(df1, on=['id','name'])
print (df)

【讨论】:

  • 对于 joining 数据框,我收到错误 columns overlap but no suffix specified: Index(['A_vs_avg_2011', 'B_vs_avg_2012', 'D_vs_avg_2014', 'C_vs_avg_2013'], dtype='object')
  • @Yusufsn 起初我也有错误,但现在我可以得到正确的结果。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2022-01-01
  • 2022-12-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-10-01
  • 1970-01-01
相关资源
最近更新 更多