在多索引熊猫数据框中添加计算指标答案

【问题标题】：Adding a calculated metric in multiindex pandas dataframe在多索引熊猫数据框中添加计算指标
【发布时间】：2021-12-19 11:11:51
【问题描述】：

我有一个df：

date            category            subcategory         order_id        product_id
2021-05-04      A                   aa                  10              5
2021-06-04      A                   dd                  10              2
2021-05-06      B                   aa                  18              3
2021-07-06      A                   aa                  50              10
2021-07-06      C                   cc                  10              15
2021-07-05      A                   ff                  101             30
2021-10-04      D                   aa                  100             15

我正在尝试计算每周 items per order 指标，即：

count(product_id) / count((unique(order_id))# 伪代码

我试过了：

a = a.set_index('date').groupby(['category','subcategory'])['order_id', 'product_id']
                       .resample('W-MON', label = 'left')
                       .agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count}).unstack()
a.fillna(0, inplace = True)

它以重复 2 次的方式返回数据：

                                order_id                                                product_id
                                order_id                    product_id                  order_id                    product_id
                date            2021-05-04  ... ...         2021-05-04  ... ...         2021-05-04  ... ...         2021-05-04  ... ...         
category        subcategory     
A               aa              1           ... ...         2           ... ...         1           ... ...         2           ... ...
A               dd              2           ... ...         1           ... ...         2           ... ...         1           ... ...

我正在尝试创建这种结构：

                                order_id        product_id          product_per_order           order_id        product_id          product_per_order                                 
                date            2021-05-04                                                      2021-05-11                      
category        subcategory     
A               aa              1               2                   2/1                         2               3                   3/2
A               dd              2               1                   1/2                         0               0                   0 # & to avoid division by 0
...             ...

【问题讨论】：

标签： python pandas group-by

【解决方案1】：

删除groupby 之后的列表，然后添加除以DataFrame.assign 的新列，最后使用unstack 重塑 - 如有必要，排序前日期时间：

a = (a.set_index('date')
     .groupby(['category','subcategory'])
     .resample('W-MON', label = 'left')
     .agg({'order_id':pd.Series.nunique, 'product_id':pd.Series.count})
     .assign(product_per_order = lambda x: x['product_id'].div(x['order_id']))
     .unstack()
     .fillna(0)
     .sort_index(level=1, axis=1, sort_remaining=False))

【讨论】：