比较棘手的一点是要弄清楚上个月是什么。我们通过计算每个日期的月初然后回滚 1 个月来做到这一点。请注意,这将处理 1 月 -> 去年 12 月的问题
我们首先创建一个示例数据框并导入一些有用的模块
from io import StringIO
from datetime import datetime,timedelta
from dateutil.relativedelta import relativedelta
data = StringIO(
"""
date|amount
2019-07-22|500
2019-07-25|200
2020-11-15|100
2020-11-06|900
2020-12-09|50
2020-12-21|600
""")
df = pd.read_csv(data,sep='|')
df['date'] = pd.to_datetime(df['date'])
df
我们得到
date amount
0 2019-07-22 500
1 2019-07-25 200
2 2020-11-15 100
3 2020-11-06 900
4 2020-12-09 50
5 2020-12-21 600
然后我们使用日期时间实用程序计算月份开始和上个月开始
df['month_start'] = df['date'].apply(lambda d:datetime(year = d.year, month = d.month, day = 1))
df['prev_month_start'] = df['month_start'].apply(lambda d:d+relativedelta(months = -1))
然后我们在月初使用groupby 总结每月销售额
ms_df = df.drop(columns = 'date').groupby('month_start').agg({'prev_month_start':'first','amount':sum}).reset_index()
ms_df
所以我们得到
month_start prev_month_start amount
0 2019-07-01 2019-06-01 700
1 2020-11-01 2020-10-01 1000
2 2020-12-01 2020-11-01 650
然后我们通过将“prev_month_start”映射到“month_start”来加入(合并)ms_df
ms_df2 = ms_df.merge(ms_df, left_on='prev_month_start', right_on='month_start', how = 'left', suffixes = ('','_prev'))
我们或多或少在那里,但现在通过去掉多余的列、添加标签等来让它变得漂亮
ms_df2['label'] = ms_df2['month_start'].dt.strftime('%Y_%m')
ms_df2 = ms_df2.drop(columns = ['month_start','prev_month_start','month_start_prev','prev_month_start_prev'])
columns = ['label','amount','amount_prev']
ms_df2 = ms_df2[columns]
我们得到
| | label | amount | amount_prev |
|---:|--------:|---------:|--------------:|
| 0 | 2019_07 | 700 | nan |
| 1 | 2020_11 | 1000 | nan |
| 2 | 2020_12 | 650 | 1000 |