【问题标题】:Aggregated Dataframe with filters带有过滤器的聚合数据框
【发布时间】:2020-09-25 22:53:14
【问题描述】:

是否可以在 pandas 中使用“NamedAgg”方法进行过滤?

这是我的示例代码:

df = pd.DataFrame({'Person': ['John','Paul','John','Paul','Taylor'],
                   'animal': ['cat', 'dog', 'cat', 'dog','dog'],
                   'from' : ['breeder','adoption','adoption','breeder','wild'],
                   'height': [9.1, 6.0, 9.5, 34.0,55],
                   'weight': [7.9, 7.5, 9.9, 198.0,200]})

df.groupby(['Person']).agg(
    number_of_animal = pd.NamedAgg(column = 'animal', aggfunc = 'count'),
    number_of_from = pd.NamedAgg(column = 'from', aggfunc = 'count'),
    total_height = pd.NamedAgg(column = 'height', aggfunc = 'sum'),
    total_weight = pd.NamedAgg(column = 'weight', aggfunc = 'sum')
    )

result = pd.DataFrame({'Person': ['John','Paul','Taylor'],
                        'number_of_animal':[2,0,0],
                        'number_of_from': [1,1,0],
                        'total_height':[0,34,55],
                        'total_weight':[17.8,205.5,200]})

对于每个单独的列,我想应用一个过滤器,例如,我想过滤“number_of_animal”df['animal'] == 'cat' 和“total_hight”df['height'] > 10 和 number_of_from df['from'] == 'breeder 的位置

【问题讨论】:

  • 请参阅stackoverflow.com/questions/43632927/… 了解过滤原始数据帧的方法,然后您可以在过滤后的帧上应用分组指令。
  • @itprorh66 将仅过滤数据帧,我想在每个系列聚合之前将过滤器应用到 1 个数据帧中。
  • 请发布您的预期输出
  • @sammywemmy 嘿,我发布了我的预期输出

标签: python python-3.x pandas dataframe


【解决方案1】:

使用DataFrame.assignSeries.where 中的不匹配值重新分配给NaNs:

df1 = (df.assign(animal = df['animal'].where(df['animal'] == 'cat'),
                 height = df['height'].where(df['height'] > 10),
                 from1 = df['from'].where(df['from'] == 'breeder')
                )
        .groupby(['Person']).agg(
                 number_of_animal = pd.NamedAgg(column = 'animal', aggfunc = 'count'),
                 number_of_from = pd.NamedAgg(column = 'from1', aggfunc = 'count'),
                 total_height = pd.NamedAgg(column = 'height', aggfunc = 'sum'),
                 total_weight = pd.NamedAgg(column = 'weight', aggfunc = 'sum')
    ))
print (df1)
        number_of_animal  number_of_from  total_height  total_weight
Person                                                              
John                   2               1           0.0          17.8
Paul                   0               1          34.0         205.5
Taylor                 0               0          55.0         200.0

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-03-31
    • 2020-06-20
    • 1970-01-01
    • 2022-10-07
    • 2023-02-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多