在 DataFrame 中通过 Group By 的 Pandas 自定义累积计算答案

【问题标题】：Pandas Custom Cumulative Calculation Over Group By in DataFrame在 DataFrame 中通过 Group By 的 Pandas 自定义累积计算
【发布时间】：2021-07-12 23:41:23
【问题描述】：

我正在尝试对数据框内的组内的每一行的值进行简单的计算，但是我在语法上遇到了问题，我想我对什么数据对象感到特别困惑我应该返回，即数据框与系列等。

就上下文而言，我跟踪的每种产品都有一堆库存值，我想通过一个自定义函数估算销售数量，该函数基本上执行以下操作：

# Because stock can go up and down, I'm looking to record the difference 
# when the stock is less than the previous stock number from the previous row.
# How do I access each row of the dataframe and then return the series I need?

def get_stock_sold(x):
    # Written in pseudo
    stock_sold = previous_stock_no - current_stock_no if current_stock_no < previous_stock_no else 0
    return pd.Series(stock_sold)

然后我有以下数据框：

# 'Order' is a date in the real dataset.

data = { 
    'id'            : ['1', '1', '1', '2', '2', '2'],
    'order'         : [1, 2, 3, 1, 2, 3],
    'current_stock' : [100, 150, 90, 50, 48, 30]
}

df = pd.DataFrame(data)
df = df.sort_values(by=['id', 'order'])
df['previous_stock'] = df.groupby('id')['current_stock'].shift(1)

我想创建一个新列 (stock_sold) 并将上面的逻辑应用于分组数据框对象中的每一行：

df['stock_sold'] = df.groupby('id').apply(get_stock_sold)

所需的输出如下所示：

| id | order | current_stock | previous_stock | stock_sold |
|----|-------|---------------|----------------|------------|
| 1  | 1     | 100           | NaN            | 0          |
|    | 2     | 150           | 100.0          | 0          |
|    | 3     | 90            | 150.0          | 60         |
| 2  | 1     | 50            | NaN            | 0          |
|    | 2     | 48            | 50.0           | 2          |
|    | 3     | 30            | 48             | 18         |

【问题讨论】：

标签： python pandas dataframe pandas-groupby custom-function

【解决方案1】：

试试：

df["previous_stock"] = df.groupby("id")["current_stock"].shift()
df["stock_sold"] = np.where(
    df["current_stock"] > df["previous_stock"].fillna(0),
    0,
    df["previous_stock"] - df["current_stock"],
)
print(df)

打印：

  id  order  current_stock  previous_stock  stock_sold
0  1      1            100             NaN         0.0
1  1      2            150           100.0         0.0
2  1      3             90           150.0        60.0
3  2      1             50             NaN         0.0
4  2      2             48            50.0         2.0
5  2      3             30            48.0        18.0

【讨论】：