熊猫自定义功能答案

【问题标题】：Pandas custom function熊猫自定义功能
【发布时间】：2021-07-12 12:26:58
【问题描述】：

我有一个这样的熊猫数据框：

Cust_ID	PROD_ID	Quantity	Price	Quantity	Price	Quantity	Price
		31-12-2020	31-12-2020	01-01-2021	01-01-2021	02-01-2021	02-01-2021
123	abc	10	5	10	5.4	11	6
123	efg	50	53	50	53	100	53
456	abc	10	5	10	5.4	10	6
456	efg	10	53	10	53	11	53

列是双索引的，前两行如下所示（日期+价格/数量）。我想为每个日期（31-12-20 之后）创建一个计算以下内容的新列：

如果数量与前一天不同，则该行的新列包含 0。 ELSE新列包含（日期数量*日期价格）减（-）（上一个日期数量*上一个日期价格）

我创建了一个 for 循环，但它会为每个日期迭代数据框，但耗时太长。如何创建这样的功能来应用？ PS 索引是可靠的，但列顺序可能不同。

【问题讨论】：

标签： python pandas function custom-function

【解决方案1】：

首先，这是您的数据框的干净版本：

df = pd.DataFrame({('123', 'abc'): [10, 5, 10, 5.4, 11, 6],
                   ('123', 'efg'): [50, 53, 50, 53, 100, 53],
                   ('456', 'abc'): [10, 5.9, 10, 5.4, 10, 6],
                   ('456', 'efg'): [10, 53, 10, 53, 11, 53]},
                  index=pd.MultiIndex.from_product([pd.to_datetime(['31-12-2020', '01-01-2021', '02-01-2021']),
                                                                   ['Quantity', 'Price']]).swaplevel()
                 ).T

数据：

          Quantity      Price   Quantity      Price   Quantity      Price
        2020-12-31 2020-12-31 2021-01-01 2021-01-01 2021-02-01 2021-02-01
123 abc       10.0        5.0       10.0        5.4       11.0        6.0
    efg       50.0       53.0       50.0       53.0      100.0       53.0
456 abc       10.0        5.9       10.0        5.4       10.0        6.0
    efg       10.0       53.0       10.0       53.0       11.0       53.0

然后您可以移动列并计算差异：

(df-df.shift(2, axis=1)).rename(mapper=lambda x: f'{x}_diff', axis='columns', level=0).dropna(axis=1)

输出：

        Quantity_diff Price_diff Quantity_diff Price_diff
           2021-01-01 2021-01-01    2021-02-01 2021-02-01
123 abc           0.0        0.4           1.0        0.6
    efg           0.0        0.0          50.0        0.0
456 abc           0.0       -0.5           0.0        0.6
    efg           0.0        0.0           1.0        0.0

组合数据框：

pd.concat([df,
           (df-df.shift(2, axis=1)).rename(mapper=lambda x: f'{x}_diff', axis='columns', level=0).dropna(axis=1)
           ], axis=1).sort_index(level=[1,0], ascending=[True, False], axis=1)

输出：

          Quantity      Price Quantity_diff   Quantity Price_diff      Price Quantity_diff   Quantity Price_diff      Price
        2020-12-31 2020-12-31    2021-01-01 2021-01-01 2021-01-01 2021-01-01    2021-02-01 2021-02-01 2021-02-01 2021-02-01
123 abc       10.0        5.0           0.0       10.0        0.4        5.4           1.0       11.0        0.6        6.0
    efg       50.0       53.0           0.0       50.0        0.0       53.0          50.0      100.0        0.0       53.0
456 abc       10.0        5.9           0.0       10.0       -0.5        5.4           0.0       10.0        0.6        6.0
    efg       10.0       53.0           0.0       10.0        0.0       53.0           1.0       11.0        0.0       53.0

【讨论】：

感谢@mozway，但不幸的是这不是我想要的。 :( 首先，当数量 chagnes 或 cur_Quant*cur_Price - prev_day_Quant * prev_day_Price 时，我需要新列包含 0。其次，列名是可靠的，但由于数据是从随机生成列的系统导出的，因此顺序可能会因月份而异订购。
那你应该改进你的问题，不幸的是我还不够清楚。