【发布时间】:2021-07-12 23:41:23
【问题描述】:
我正在尝试对数据框内的组内的每一行的值进行简单的计算,但是我在语法上遇到了问题,我想我对什么数据对象感到特别困惑我应该返回,即数据框与系列等。
就上下文而言,我跟踪的每种产品都有一堆库存值,我想通过一个自定义函数估算销售数量,该函数基本上执行以下操作:
# Because stock can go up and down, I'm looking to record the difference
# when the stock is less than the previous stock number from the previous row.
# How do I access each row of the dataframe and then return the series I need?
def get_stock_sold(x):
# Written in pseudo
stock_sold = previous_stock_no - current_stock_no if current_stock_no < previous_stock_no else 0
return pd.Series(stock_sold)
然后我有以下数据框:
# 'Order' is a date in the real dataset.
data = {
'id' : ['1', '1', '1', '2', '2', '2'],
'order' : [1, 2, 3, 1, 2, 3],
'current_stock' : [100, 150, 90, 50, 48, 30]
}
df = pd.DataFrame(data)
df = df.sort_values(by=['id', 'order'])
df['previous_stock'] = df.groupby('id')['current_stock'].shift(1)
我想创建一个新列 (stock_sold) 并将上面的逻辑应用于分组数据框对象中的每一行:
df['stock_sold'] = df.groupby('id').apply(get_stock_sold)
所需的输出如下所示:
| id | order | current_stock | previous_stock | stock_sold |
|----|-------|---------------|----------------|------------|
| 1 | 1 | 100 | NaN | 0 |
| | 2 | 150 | 100.0 | 0 |
| | 3 | 90 | 150.0 | 60 |
| 2 | 1 | 50 | NaN | 0 |
| | 2 | 48 | 50.0 | 2 |
| | 3 | 30 | 48 | 18 |
【问题讨论】:
标签: python pandas dataframe pandas-groupby custom-function