【发布时间】:2020-01-25 14:14:59
【问题描述】:
我是 Pandas 的新手。 我有一个赛马结果数据集。此处示例:
RaceID RaceDate RaceMeet Position Horse Jockey Trainer RaceLength race win HorseWinPercentage
446252 01/01/2008 Southwell (AW) 1 clear reef tom mclaughlin jane chapple-hyam 3101 1 1 0
447019 14/01/2008 Southwell (AW) 5 clear reef tom mclaughlin jane chapple-hyam 2654 1 0 100
449057 21/02/2008 Southwell (AW) 2 clear reef tom mclaughlin jane chapple-hyam 3101 1 0 50
463805 26/08/2008 Chelmsford (AW) 6 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 33.33333333
469220 27/11/2008 Chelmsford (AW) 3 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 25
470195 11/12/2008 Chelmsford (AW) 5 clear reef tom mclaughlin jane chapple-hyam 3080 1 0 20
471052 26/12/2008 Wolhampton (AW) 1 clear reef andrea atzeni jane chapple-hyam 2690 1 1 16.66666667
471769 07/01/2009 Wolhampton (AW) 6 clear reef ian mongan jane chapple-hyam 2690 1 0 28.57142857
472137 13/01/2009 Chelmsford (AW) 2 clear reef jamie spencer jane chapple-hyam 3080 1 0 25
472213 20/01/2009 Southwell (AW) 5 clear reef jamie spencer jane chapple-hyam 2654 1 0 22.22222222
476595 25/03/2009 Kempton (AW) 4 clear reef pat cosgrave jane chapple-hyam 2639 1 0 20
477674 08/04/2009 Kempton (AW) 5 clear reef pat cosgrave jane chapple-hyam 2639 1 0 18.18181818
479098 21/04/2009 Kempton (AW) 3 clear reef andrea atzeni jane chapple-hyam 2639 1 0 16.66666667
492913 14/11/2009 Wolhampton (AW) 1 clear reef andrea atzeni jane chapple-hyam 3639 1 1 15.38461538
493720 25/11/2009 Kempton (AW) 3 clear reef andrea atzeni jane chapple-hyam 3518 1 0 21.42857143
495863 29/12/2009 Southwell (AW) 1 clear reef shane kelly jane chapple-hyam 3101 1 1 20
我希望能够通过 groupby() 多轴来计算获胜次数并创建组合获胜百分比或特定赛道和长度的结果。
当我只需要按单个轴进行分组时——效果很好:
df['horse_win_count'] = df.groupby(['Horse'])['win'].cumsum()
df['horse_race_count'] = df.groupby(['Horse'])['race'].cumsum()
df['HorseWinPercentage2'] = df['horse_win_count'] / df['horse_race_count'] * 100
df['HorseWinPercentage'] = df.groupby('Horse')['HorseWinPercentage2'].shift(+1)
但是,当我需要对多个轴进行分组时,我会得到一些非常奇怪的结果。
例如,我要为特定骑师骑特定训练师的马创建一个获胜百分比 - groupby(['Jockey','Trainer'])。然后我需要知道每个单独行(种族)的百分比变化。
df['jt_win_count'] = df.groupby(['Jockey','Trainer'])['win'].cumsum()
df['jt_race_count'] = df.groupby(['Jockey','Trainer'])['race'].cumsum()
df['JTWinPercentage2'] = df['jt_win_count'] / df['jt_race_count'] * 100
df['JTWinPercentage'] = df.groupby(['Jockey','Trainer'])['JTWinPercentage2'].shift(+1)
df['JTWinPercentage'].fillna(0, inplace=True)
或者我想计算一匹马在那个路线和那个距离上获胜的次数。所以我需要 groupby(['Horse', 'RaceMeet','RaceLength']):
df['CD'] = df.groupby([‘RaceMeet’,’RaceLength’,’Horse’])[‘win’].cumsum()
df['CD'] = df.groupby(["RaceMeet","RaceLength","Horse"]).shift(+1)
我得到的结果是 1000 的 10。
如何按多个轴进行分组,进行计算并将结果向后移动一个条目,同时按多个条目分组?
你能更好地解释为什么我上面的代码不起作用吗?就像我说的那样,我是 Pandas 的新手并且热衷于学习。
干杯。
【问题讨论】:
-
对于
CD,应该是df['CD'] = df.groupby(...)['CD'].shift()。