熊猫：按相等范围分组答案

【问题标题】：Pandas: group by equal range熊猫：按相等范围分组
【发布时间】：2017-11-02 16:02:58
【问题描述】：

这是我的数据框的一个例子：

df_lst = [
  {"wordcount": 100, "Stats": 198765, "id": 34},
     {"wordcount": 99, "Stats": 98765, "id": 35},
     {"wordcount": 200, "Stats": 18765, "id": 36},
     {"wordcount": 250, "Stats": 788765, "id": 37},
     {"wordcount": 345, "Stats": 12765, "id": 38},
     {"wordcount": 456, "Stats": 238765, "id": 39},
     {"wordcount": 478, "Stats": 1934, "id": 40},
     {"wordcount": 890, "Stats": 19845, "id": 41},
     {"wordcount": 812, "Stats": 1987, "id": 42}]
df = pd.DataFrame(df_lst)
df.set_index('id', inplace=True)
df.head()

DF：

    Stats   wordcount
id      
34  198765  100
35  98765   99
36  18765   200
37  788765  250
38  12765   345

我想计算每个wordcount 范围的平均值Stats，步长为 100，因此新数据框如下所示：

    Average wordcount
    194567  100
    23456   200
    2378    300
    ...

其中 100 表示 0-100 等。我开始编写多个条件，但感觉有一种更有效的方法来实现这一点。感谢您的帮助。

【问题讨论】：

标签： python pandas pandas-groupby

【解决方案1】：

使用pd.cut()方法：

In [92]: bins = np.arange(0, df['wordcount'].max().round(-2) + 100, 100)

In [94]: df.groupby(pd.cut(df['wordcount'], bins=bins, labels=bins[1:]))['Stats'].mean()
Out[94]:
wordcount
100    148765.0
200     18765.0
300    788765.0
400     12765.0
500    120349.5
600         NaN
700         NaN
800         NaN
900     10916.0
Name: Stats, dtype: float64

【讨论】：

哦，我的天，一模一样。你明白了。
@Bharath，对此感到抱歉:)
我通常使用 pd.cut 很快，但这次不是。
正是我想要的 - 谢谢！
@aviss 添加新方法 :-) 希望对您有所帮助 ~ :-)

【解决方案2】：

import math
def roundup(x):
    return int(math.ceil(x / 100.0)) * 100
df['roundup']=df.wordcount.apply(roundup)
df.groupby('roundup').Stats.mean()
Out[824]: 
roundup
100    148765.0
200     18765.0
300    788765.0
400     12765.0
500    120349.5
900     10916.0
Name: Stats, dtype: float64

【讨论】：