【问题标题】:Pandas - Find rolling average of past 3 daysPandas - 查找过去 3 天的滚动平均值
【发布时间】:2021-07-01 07:13:21
【问题描述】:

我正在尝试查找特定元素组过去 3 天数据的平均值,如下所示。下面是我的 Dataframe 的样子:

day, category, sub-category, count
2021-01-01, electronic, phone, 10
2021-01-02, electronic, phone, 12
2021-01-03, electronic, phone, 3
2021-01-04, electronic, phone, 20
2021-01-01, electronic, computer, 15
2021-01-02, electronic, computer, 12
2021-01-03, electronic, computer, 20

预期输出

day, category, sub-category, count, rolling_average_past_3_days
2021-01-01, electronic, phone, 10,
2021-01-02, electronic, phone, 12,
2021-01-03, electronic, phone, 3, 8.3
2021-01-04, electronic, phone, 20, 11.6
2021-01-01, electronic, computer, 15,
2021-01-02, electronic, computer, 12,
2021-01-03, electronic, computer, 20, 15.7

【问题讨论】:

    标签: python pandas rolling-computation


    【解决方案1】:

    试试这个:

    df['rolling_average_past_3_days'] = df.groupby('sub-category')['count'].rolling(3).mean().droplevel(0)
    

    【讨论】:

    • 感谢您回来。我尝试了上面的方法,但它返回了一个包含所有 NAN 的新列
    • 有趣,它对我有用......当你只运行 df.groupby('sub-category')['count'].rolling(3).mean() 时,它会返回你所期望的(在你的 df 之外)
    • 这很有趣...我能想到的唯一另一件事是检查您的 count 列是否为数字类型。
    【解决方案2】:

    为了通用,您可以在“电子”之外拥有多个主要类别,您也需要按类别分组。您还需要使用.transform(),以便将结果广播回原始结构,以便您可以获取所有条目。否则,您可能会收到如下错误:TypeError: incompatible index of inserted column with frame index

    你可以试试这个:

    df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean())
    
    print(df)
    
              day    category sub-category  count  rolling_average_past_3_days
    0  2021-01-01  electronic        phone     10                          NaN
    1  2021-01-02  electronic        phone     12                          NaN
    2  2021-01-03  electronic        phone      3                     8.333333
    3  2021-01-04  electronic        phone     20                    11.666667
    4  2021-01-01  electronic     computer     15                          NaN
    5  2021-01-02  electronic     computer     12                          NaN
    6  2021-01-03  electronic     computer     20                    15.666667
    

    每个子类别的前 2 个条目上显示的 NaN 值是正常的,因为您在 3 天内滚动平均,而前 2 天没有任何值。如果你想让它们空白而不是显示NaN,你可以使用这个:

    df['rolling_average_past_3_days'] = df.groupby(['category', 'sub-category'])['count'].transform(lambda x: x.rolling(3).mean()).replace(np.nan, '')
    
    
    print(df)
    
              day    category sub-category  count rolling_average_past_3_days
    0  2021-01-01  electronic        phone     10                            
    1  2021-01-02  electronic        phone     12                            
    2  2021-01-03  electronic        phone      3                    8.333333
    3  2021-01-04  electronic        phone     20                   11.666667
    4  2021-01-01  electronic     computer     15                            
    5  2021-01-02  electronic     computer     12                            
    6  2021-01-03  electronic     computer     20                   15.666667
    

    【讨论】:

      猜你喜欢
      • 2016-08-26
      • 2019-09-04
      • 2021-05-19
      • 1970-01-01
      • 2023-02-08
      • 2020-09-05
      • 2020-04-25
      • 1970-01-01
      • 2021-06-18
      相关资源
      最近更新 更多