【问题标题】:pandas - calculation of percent change for a sub-group within a grouppandas - 计算组内子组的百分比变化
【发布时间】:2022-01-26 22:44:07
【问题描述】:

我有一个要求,我需要计算订单组的百分比变化。如果主组中子组的行数相等,那么我到目前为止所做的工作效果很好。我还需要考虑数量。

time            txn_type symbol qty price
27/12/21 10:32  BUY      XYZ    1   4054.5
27/12/21 10:26  SELL     XYZ    2   4053.65
27/12/21 10:00  BUY      XYZ    1   4072.25
27/12/21 09:56  BUY      XYZ    1   4045.15
27/12/21 09:50  SELL     XYZ    1   4034.25
27/12/21 09:40  BUY      XYZ    1   4006
27/12/21 09:20  SELL     XYZ    1   3978.1
27/12/21 10:55  SELL     MNO    1   1714.95
27/12/21 10:25  BUY      PQR    1   768.7
27/12/21 10:05  SELL     PQR    1   765.05
27/12/21 09:57  SELL     PQR    1   764
27/12/21 09:40  BUY      PQR    1   769
27/12/21 09:28  SELL     PQR    1   765.8
27/12/21 09:20  BUY      PQR    1   768.95
27/12/21 09:20  BUY      MNO    1   1703.55


symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
                            'symbol': 'first',
                            'txn_type': 'first',
                            'price': np.sum
                        })
symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
                            lambda g: round(((g.shift(-1) - g) / g) * 100, 2))
symbol_percent_df.reset_index(inplace=True)
symbol_percent_df = symbol_percent_df[symbol_percent_df['txn_type'] == "BUY"]
symbol_percent_df.sort_values(by=['price'], ascending=False, inplace=True)
symbol_pct_dict: dict = symbol_percent_df.set_index('symbol')['price'].to_dict()

以上代码适用于 MNO、PQR,但对于 XYZ 给出的结果不正确,因为 10:26 一行的数量不同。

我需要的是字典中符号明智的百分比变化。

【问题讨论】:

  • 你能添加预期的正确输出吗?
  • 一个想法,猜测 - 在你的解决方案之前可以将价格除以计数吗? order_df['price'] = order_df['price'].div(order_df['qty'])
  • 0.36 还是-0.36 ?
  • 应该是-0.36。 percent_change = ((total_sell_price - total_buy_price)/total_buy_price)*100

标签: python-3.x pandas pandas-groupby


【解决方案1】:

解决方案是按qty 重复行,所以在这里获得双1 行:

order_df = order_df.loc[order_df.index.repeat(order_df['qty'])]

print (order_df)
              time txn_type symbol  qty    price
0   27/12/21 10:32      BUY    XYZ    1  4054.50
1   27/12/21 10:26     SELL    XYZ    2  4053.65
1   27/12/21 10:26     SELL    XYZ    2  4053.65
2   27/12/21 10:00      BUY    XYZ    1  4072.25
3   27/12/21 09:56      BUY    XYZ    1  4045.15
4   27/12/21 09:50     SELL    XYZ    1  4034.25
5   27/12/21 09:40      BUY    XYZ    1  4006.00
6   27/12/21 09:20     SELL    XYZ    1  3978.10
7   27/12/21 10:55     SELL    MNO    1  1714.95
8   27/12/21 10:25      BUY    PQR    1   768.70
9   27/12/21 10:05     SELL    PQR    1   765.05
10  27/12/21 09:57     SELL    PQR    1   764.00
11  27/12/21 09:40      BUY    PQR    1   769.00
12  27/12/21 09:28     SELL    PQR    1   765.80
13  27/12/21 09:20      BUY    PQR    1   768.95
14  27/12/21 09:20      BUY    MNO    1  1703.55

symbol_orders_df = order_df.groupby(['symbol', 'txn_type']).agg({
                            'symbol': 'first',
                            'txn_type': 'first',
                            'price': np.sum
                        })

print (symbol_orders_df)
                symbol txn_type     price
symbol txn_type                          
MNO    BUY         MNO      BUY   1703.55
       SELL        MNO     SELL   1714.95
PQR    BUY         PQR      BUY   2306.65
       SELL        PQR     SELL   2294.85
XYZ    BUY         XYZ      BUY  16177.90
       SELL        XYZ     SELL  16119.65

symbol_percent_df = symbol_orders_df.groupby(level=[0]).transform(
                            lambda g: round(((g.shift(-1) - g) / g) * 100, 2))

print (symbol_percent_df)
                 price
symbol txn_type       
MNO    BUY        0.67
       SELL        NaN
PQR    BUY       -0.51
       SELL        NaN
XYZ    BUY       -0.36
       SELL        NaN

另一个想法是多列:

order_df['price'] = order_df['price'].mul(order_df['qty'])

print (order_df)
              time txn_type symbol  qty    price
0   27/12/21 10:32      BUY    XYZ    1  4054.50
1   27/12/21 10:26     SELL    XYZ    2  8107.30
2   27/12/21 10:00      BUY    XYZ    1  4072.25
3   27/12/21 09:56      BUY    XYZ    1  4045.15
4   27/12/21 09:50     SELL    XYZ    1  4034.25
5   27/12/21 09:40      BUY    XYZ    1  4006.00
6   27/12/21 09:20     SELL    XYZ    1  3978.10
7   27/12/21 10:55     SELL    MNO    1  1714.95
8   27/12/21 10:25      BUY    PQR    1   768.70
9   27/12/21 10:05     SELL    PQR    1   765.05
10  27/12/21 09:57     SELL    PQR    1   764.00
11  27/12/21 09:40      BUY    PQR    1   769.00
12  27/12/21 09:28     SELL    PQR    1   765.80
13  27/12/21 09:20      BUY    PQR    1   768.95
14  27/12/21 09:20      BUY    MNO    1  1703.55
symbol_orders_df = order_df.groupby(['symbol', 'txn_type'])['price'].sum()
print (symbol_orders_df)
symbol  txn_type
MNO     BUY          1703.55
        SELL         1714.95
PQR     BUY          2306.65
        SELL         2294.85
XYZ     BUY         16177.90
        SELL        16119.65
Name: price, dtype: float64

symbol_percent_df = (symbol_orders_df.groupby(level=[0]).shift(-1)
                                 .sub(symbol_orders_df)
                                 .div(symbol_orders_df)
                                 .mul(100)
                                 .round(2)
                                 .to_frame())

print (symbol_percent_df)
                 price
symbol txn_type       
MNO    BUY        0.67
       SELL        NaN
PQR    BUY       -0.51
       SELL        NaN
XYZ    BUY       -0.36
       SELL        NaN

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-02-03
    • 2019-01-26
    • 1970-01-01
    • 1970-01-01
    • 2021-11-30
    • 1970-01-01
    • 2023-03-22
    相关资源
    最近更新 更多