【发布时间】:2020-06-20 18:16:13
【问题描述】:
从这个表中,我尝试通过数据框中可用的最小/最大每周日期来插入缺失的日期。然后,计算每个类别出现 0 次销售。
df=pd.DataFrame({'category_id': ['aaa','aaa','aaa','aaa','bbb','bbb','bbb','ccc','ccc'],
'week': ['2015-01-05', '2015-01-12', '2015-01-19', '2015-01-26','2015-01-12', '2015-01-19', '2015-01-26','2015-01-05', '2015-01-12'],
'sales': [0,20,30,10,45,0,47,0,10]})
第一步:将缺失的每周日期添加到所有类别,并将缺失的日期填入 0(Q1:我不确定如何获得这个 df_add_missing_dates 结果)
# expected dates interpolation output
df_add_missing_dates=pd.DataFrame({'category_id': ['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb','ccc','ccc','ccc','ccc'],
'week': ['2015-01-05', '2015-01-12', '2015-01-19', '2015-01-26',
'2015-01-05', '2015-01-12', '2015-01-19', '2015-01-26',
'2015-01-05', '2015-01-12', '2015-01-19', '2015-01-26'],
'sales': [0,20,30,10,
0,45,0,47,
0,10,0,0]})
第二步:统计每周销售额为0的发生次数(Q2:如何汇总每个类别的销售额=0?)
# expected final output
category_id | sales_0_count
aaa | 1
bbb | 2
ccc | 3
当前代码和逻辑:
# convert string to datetime and set as index
df['week'] = pd.to_datetime(df['week'], format='%Y-%m-%d')
# find min/max weekly dates in the dataframe --> I couldn't add missing dates with 0 sales though
idx = pd.period_range(start=df.week.min(),end=df.week.max(),freq='W')
df = df.reindex(idx, fill_value=0).reset_index(drop=True)
df_add_missing_dates = df
# group by category to count how many times weekly sales is 0
【问题讨论】:
标签: python pandas datetime pandas-groupby