Python 数据集聚合答案

【问题标题】：Python dataset aggregationPython 数据集聚合
【发布时间】：2018-06-19 16:33:18
【问题描述】：

我有一个数据集如下，存储在pd.DataFrame对象中：

df

    topic  student level week
 1   sun      a       1     1
 1   sun      b       2     1
 1   moon     a       3     1
 2   tree     a       1     2
 2   tree     b       2     2
 2   tree     a       3     2
 2   tree     b       4     2
 3   cloud    c       1     2
 3   cloud    b       2     2
 3   cloud    c       3     2
 3   cloud    a       4     2
 3   house    b       5     2

我想汇总每个 id 包含的列作为学生数和消息数。

id  topic  num_students num_messages
 1   sun      2            2
 1   moon     1            1
 2   tree     2            4
 3   cloud    3            4
 3   house    1            1

其中num_students 是每个 id/topic 对 df1 中唯一 student 的数量，num_messages 是 id/topic 对的数量。

有人有想法吗？

【问题讨论】：

你用什么来存储你的数据集？熊猫，NumPy？另外，您尝试过什么，为什么它不起作用？
@DavidG 熊猫！
你有没有尝试过？ num_students, num_message 是什么？请尽量把你的问题说清楚...

标签： python pandas aggregation

【解决方案1】：

我认为您需要通过 agg 与函数 nunique 和 size 聚合：

d = {'nunique':'num_students','size':'num_messages'}
df1 = (df.groupby(['id','topic'], sort=False)['student']
         .agg(['nunique','size'])
         .rename(columns=d)
         .reset_index())
print (df1)
   id  topic  num_students  num_messages
0   1    sun             2             2
1   1   moon             1             1
2   2   tree             2             4
3   3  cloud             3             4
4   3  house             1             1

【讨论】：