如何为每个类别分年龄列

【问题标题】：How to bin age column for each category如何为每个类别分年龄列
【发布时间】：2021-09-24 05:47:12
【问题描述】：

import numpy as np 
import pandas as pd

df = pd.DataFrame({
    'age':np.random.choice( [12,15,17,95,13], 20),
    'category':np.random.choice(['A','B','C', 'D'], 20)
    })

Category Age
A        12
A        95
B        17
B        14
D        12
C        14
B        16

考虑到类别，我想对年龄值进行分类。假设，我对类别 A，取其最小值和最大值，然后找到 bin。如何找到不同类别的垃圾箱？我将它用于整列bins = np.linspace(df[col_name].min(), df[col_name].max(), 11) 的行。然后像这样分组grp = df.groupby(pd.cut(df[col_name], bins))

【问题讨论】：

欢迎来到 Stackoverflow。请花时间阅读how to provide a great pandas example 上的这篇文章以及如何提供minimal, complete, and verifiable example 并相应地修改您的问题。 how to ask a good question 上的这些提示也可能有用。
@jezrael 我改了。

标签： python python-3.x pandas data-analysis

【解决方案1】：

第一种方法可能是：

def bin_age(sr):
    start = sr.min()
    stop = sr.max()
    num = 11
    bins = list(np.linspace(start, stop, num)) if len(sr) > 1 else [start]
    bins = [-np.inf] + bins + [np.inf]
    return pd.cut(sr, bins=bins, include_lowest=True)

df['Bins'] = df.groupby('Category')['Age'].apply(bin_age)

输出：

>>> df
  Category  Age           Bins
0        A   12   (-inf, 12.0]
1        A   95   (86.7, 95.0]
2        B   17   (16.7, 17.0]
3        B   14   (-inf, 14.0]
4        D   12  (11.999, inf]
5        C   14  (13.999, inf]
6        B   16   (15.8, 16.1]

【讨论】：

bin_age 需要输入。所以我添加了 df['Age'] 并得到了TypeError: 'Series' objects are mutable, thus they cannot be hashed
什么意思？ bin_age 的输入由apply 自动给出。除了组，还需要给函数添加参数吗？