【问题标题】:How to Reverse Rolling Sum?如何反转滚动和?
【发布时间】:2022-03-03 01:24:09
【问题描述】:

我有一个在分组数据框上计算的滚动总和,但它以错误的方式相加,它是未来的总和,而我需要过去的总和。

我在这里做错了什么?

我导入数据并按维度和日期排序(我已经尝试删除日期排序)

df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)

然后我创建一个新列,它是一个按滚动窗口分组的多索引

new_column = df.groupby('Dimension').Value1.apply(lambda x: 
x.rolling(window=3).sum())

然后我将索引重置为与原始索引相同

df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)

我也尝试在计算前反转索引,但也失败了。

输入

Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20

输出:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20         NaN
1           1   1/31/2002      10      20         NaN
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20        30.0
6           1  10/31/2000      10      20        30.0
7           2   4/30/2002      10      20         NaN
8           2   1/31/2002      10      20         NaN
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20        30.0
13          2  10/31/2000      10      20        30.0

目标输出:

    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         Nan
13          2  10/31/2000      10      20         NaN

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    您需要一个反向总和,因此在总和滚动之前反转您的系列:

    lambda x: x[::-1].rolling(window=3).sum()
    

    【讨论】:

      【解决方案2】:

      您可以将结果移动window-1 以获得左对齐的结果:

      df["sum_value1"] = (df.groupby('Dimension').Value1
                            .apply(lambda x: x.rolling(window=3).sum().shift(-2)))
      

      【讨论】:

      • 我认为您被 OP 示例中的常量值误导了,但需要进行反向求和
      • @Boud 我确实忽略了那部分。但我认为这仍然巧合地给出了正确的结果:)。实际上,如果数据按日期排序,这里的过去或未来就是您放置总和结果的地方。或者,也许 OP 最初只需要按日期升序对数据帧进行排序。
      • Sneaky @Boud 作弊并阅读了 OP 帖子 /shakes_head 中的“文字”
      【解决方案3】:

      向后滚动与向前滚动然后移动结果相同:

      x.rolling(window=3).sum().shift(-2)
      

      【讨论】:

      • 您需要像 Psidom 的回答那样使用apply,否则它会跨组转移。
      【解决方案4】:
      def reverse_rolling(series, window, func):
          index = series.index
          series = pd.DataFrame(series.iloc[::-1])
          series = series.rolling(window, 1).apply(func)
          series = series.iloc[::-1]
      
          series['index'] = index
          series = series.set_index('index')
          return series[0]
      

      【讨论】:

        【解决方案5】:

        你可以使用

        import pandas as pd
        
        from pandas.api.indexers import FixedForwardWindowIndexer
        
        df = pd.read_csv(r'C:\Users\xxxx\python\data.txt')
        
        indexer = FixedForwardWindowIndexer(window_size=3)
        
        df1 = df.join(df.groupby('Dimension')['Value1'].rolling(indexer, min_periods=3).sum().to_frame().reset_index(), rsuffix='_sum')
        
        del df1['Dimension_sum']
        del df1['level_1']
        
        df1
        

        输入:

            Dimension        Date  Value1  Value2
        0           1   4/30/2002      10      20
        1           1   1/31/2002      10      20
        2           1  10/31/2001      10      20
        3           1   7/31/2001      10      20
        4           1   4/30/2001      10      20
        5           1   1/31/2001      10      20
        6           1  10/31/2000      10      20
        7           2   4/30/2002      10      20
        8           2   1/31/2002      10      20
        9           2  10/31/2001      10      20
        10          2   7/31/2001      10      20
        11          2   4/30/2001      10      20
        12          2   1/31/2001      10      20
        13          2  10/31/2000      10      20
        14          3   4/30/2002      10      20
        15          3   1/31/2002      10      20
        16          3  10/31/2001      10      20
        17          3   7/31/2001      10      20
        18          3   1/31/2001      10      20
        19          3  10/31/2000      10      20
        

        输出:

            Dimension        Date  Value1  Value2  Value1_sum
        0           1   4/30/2002      10      20        30.0
        1           1   1/31/2002      10      20        30.0
        2           1  10/31/2001      10      20        30.0
        3           1   7/31/2001      10      20        30.0
        4           1   4/30/2001      10      20        30.0
        5           1   1/31/2001      10      20         NaN
        6           1  10/31/2000      10      20         NaN
        7           2   4/30/2002      10      20        30.0
        8           2   1/31/2002      10      20        30.0
        9           2  10/31/2001      10      20        30.0
        10          2   7/31/2001      10      20        30.0
        11          2   4/30/2001      10      20        30.0
        12          2   1/31/2001      10      20         NaN
        13          2  10/31/2000      10      20         NaN
        14          3   4/30/2002      10      20        30.0
        15          3   1/31/2002      10      20        30.0
        16          3  10/31/2001      10      20        30.0
        17          3   7/31/2001      10      20        30.0
        18          3   1/31/2001      10      20         NaN
        19          3  10/31/2000      10      20         NaN
        

        【讨论】:

          猜你喜欢
          • 2021-03-17
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-07-12
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多