【问题标题】:Resample and aggregate data according to another column value根据另一列值重新采样和聚合数据
【发布时间】:2023-03-29 21:49:02
【问题描述】:

我的时间序列是这样的:

TranID,Time,Price,Volume,SaleOrderVolume,BuyOrderVolume,Type,SaleOrderID,SaleOrderPrice,BuyOrderID,BuyOrderPrice
1,09:25:00,137.69,200,200,453,B,182023,137.69,241939,137.69
2,09:25:00,137.69,253,300,453,S,184857,137.69,241939,137.69
3,09:25:00,137.69,47,300,200,B,184857,137.69,241322,137.69
4,09:25:00,137.69,153,200,200,B,219208,137.69,241322,137.69

我想按体积重新采样和聚合数据帧,但结果,我应该能够得到类似的结果:

Time, Volume_B, Volume_S
09:25:00, 400, 253

Type为“B”时,Volume_B为总成交量,当Type为“S”时,Volume_S为总成交量。

我的函数如下所示,但效果不佳。

data.resample('t').agg(Volume_B=(Volume=lambda x: np.where(x['Type']=='B', x['Volume'], 0)), Volume_A=(Volume=lambda x: np.where(x['Type']=='S', x['Volume'], 0)))

如何正确实施?

【问题讨论】:

    标签: python pandas aggregation


    【解决方案1】:

    一种方法是在之前使用 np.where 创建列 Volume_B(和 _S),然后聚合,所以:

    res = (
        df.assign(Volume_B= lambda x: np.where(x['Type']=='B', x['Volume'], 0), 
                  Volume_S= lambda x: np.where(x['Type']=='S', x['Volume'], 0))\
          .groupby(df['Time']) # you can replace by resample here
          [['Volume_B','Volume_S']].sum()
          .reset_index()
    )
    print(res)
           Time  Volume_B  Volume_S
    0  09:25:00       400       253
    

    编辑,使用您的输入(并在时间列上汇总),然后您也可以执行pivot_table 之类的操作:

    (df.pivot_table(index='Time', columns='Type', 
                    values='Volume', aggfunc=sum)
       .add_prefix('Volume_')
       .reset_index()
       .rename_axis(columns=None)
    )
    

    【讨论】:

      猜你喜欢
      • 2012-01-30
      • 2020-07-23
      • 1970-01-01
      • 1970-01-01
      • 2017-01-27
      • 2018-03-13
      • 2022-11-10
      • 2021-07-24
      • 1970-01-01
      相关资源
      最近更新 更多