Python 2.7：DataFrame groupby 和 find 查找组内值的百分比分布答案

【问题标题】：Python 2.7: DataFrame groupby and find find the percentage distribution of values within groupPython 2.7：DataFrame groupby 和 find 查找组内值的百分比分布
【发布时间】：2018-02-14 13:35:49
【问题描述】：

我有一个数据框，我想找出组内列中值的百分比差异。

一个组的例子是 df.groupby(['race', 'tyre', 'stint']).get_group(("Australian Grand Prix", "Super soft", 1))

我想知道组中每一行的“时间差异”值的百分比分布是多少。

她是字典格式的dataframe。会有很多其他的组，但是df下面只显示第一组。

{'driverRef': {0: 'vettel',
  1: 'raikkonen',
  2: 'rosberg',
  4: 'hamilton',
  6: 'ricciardo',
  7: 'alonso',
  14: 'haryanto'},
 'race': {0: 'Australian Grand Prix',
  1: 'Australian Grand Prix',
  2: 'Australian Grand Prix',
  4: 'Australian Grand Prix',
  6: 'Australian Grand Prix',
  7: 'Australian Grand Prix',
  14: 'Australian Grand Prix'},
 'stint': {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0, 6: 1.0, 7: 1.0, 14: 1.0},
 'total diff': {0: 125147.50728499777,
  1: 281292.0366694695,
  2: 166278.41312954266,
  4: 64044.234019635056,
  6: 648383.28046950256,
  7: 400675.77449897071,
  14: 2846411.2560531585},
 'tyre': {0: u'Super soft',
  1: u'Super soft',
  2: u'Super soft',
  4: u'Super soft',
  6: u'Super soft',
  7: u'Super soft',
  14: u'Super soft'}}

【问题讨论】：

预期输出是什么？你需要this 吗？
@jezrael 是的，我看到了，但我很难将其应用于我自己的问题。让我再试一次......

标签： python python-2.7

【解决方案1】：

如果我正确理解您的需求，这可能会有所帮助：

sums = df.groupby(['race', 'tyre', 'stint'])['total diff'].sum()
df = df.set_index(['race', 'tyre', 'stint']).assign(pct=sums).reset_index()
df['pct'] = df['total diff'] / df['pct']

#                     race        tyre  stint  driverRef    total diff       pct
# 0  Australian Grand Prix  Super soft    1.0     vettel  1.251475e+05  0.027613
# 1  Australian Grand Prix  Super soft    1.0  raikkonen  2.812920e+05  0.062065
# 2  Australian Grand Prix  Super soft    1.0    rosberg  1.662784e+05  0.036688
# 3  Australian Grand Prix  Super soft    1.0   hamilton  6.404423e+04  0.014131
# 4  Australian Grand Prix  Super soft    1.0  ricciardo  6.483833e+05  0.143060
# 5  Australian Grand Prix  Super soft    1.0     alonso  4.006758e+05  0.088406
# 6  Australian Grand Prix  Super soft    1.0   haryanto  2.846411e+06  0.628037

【讨论】：