了解熊猫中的 groupby答案

【问题标题】：Understanding groupby in pandas了解熊猫中的 groupby
【发布时间】：2014-10-02 05:55:00
【问题描述】：

我希望在分组后获取数据框中某些值的总和。

一些样本数据：

Race          officeID   CandidateId  total_votes   precinct
Mayor         10         705            20           Bell
Mayor         10         805            30           Bell
Treasurer     12         505            10           Bell
Treasurer     12         506            40           Bell
Treasurer     12         507            30           Bell
Mayor         10         705            50           Park
Mayor         10         805            10           Park
Treasurer     12         505            5            Park
Treasurer     12         506            13           Park
Treasurer     12         507            16           Park

要获得每个候选人的总票数，我可以这样做：

cand_votes = df.groupby('CandidateId').sum().total_votes
print cand_votes

CandidateId
505    15
506    53
507    46
705    70
805    40

要获得每个办公室的总票数：

total_votes = df.groupby('officeID').sum().total_votes
print total_votes

officeID
10    110
12    114

但是，如果我想获得每位候选人获得的选票百分比怎么办？我是否必须对每个数据对象应用某种功能？理想情况下，我希望最终的数据对象看起来像：

officeID    CandidateID    total_votes    vote_pct
10          705            70             .6363
10          805            40             .37

【问题讨论】：

标签： python pandas

【解决方案1】：

首先，创建一个包含候选人和职位投票的框架。

gb = df.groupby(['officeID','CandidateId'], as_index=False)['total_votes'].sum()

然后，您可以按办公室聚合并使用转换（返回类似索引数据）来计算办公室百分比。

gb['vote_pct'] = gb['total_votes'] / gb.groupby('officeID')['total_votes'].transform('sum')


In [146]: gb
Out[146]: 
   officeID  CandidateId  total_votes  vote_pct
0        10          705           70  0.636364
1        10          805           40  0.363636
2        12          505           15  0.131579
3        12          506           53  0.464912
4        12          507           46  0.403509

【讨论】：