用途:
#added parse_dates for datetimes
df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/x_restock.csv',
parse_dates=['Date'])
第一个解决方案是在DataFrame.reindex 中添加从最小和最大日期时间到MultiIndex.from_product 的完整日期时间范围:
mux = pd.MultiIndex.from_product([df['Product_ID'].unique(),
pd.date_range(df.Date.min(), df.Date.max())],
names=['Product_ID','Dates'])
df1 = df.set_index(['Product_ID','Date']).reindex(mux, fill_value=0).reset_index()
print (df1)
Product_ID Dates restocking_events
0 1004746 2021-11-13 0
1 1004746 2021-11-14 0
2 1004746 2021-11-15 0
3 1004746 2021-11-16 1
4 1004746 2021-11-17 0
... ... ...
3379 976460 2021-11-26 1
3380 976460 2021-11-27 0
3381 976460 2021-11-28 0
3382 976460 2021-11-29 0
3383 976460 2021-11-30 0
[3384 rows x 3 columns]
helper DataFrame 的另一个想法:
from itertools import product
dfdate=pd.DataFrame(product(df['Product_ID'].unique(),
pd.date_range(df.Date.min(), df.Date.max())),
columns=['Product_ID','Date'])
print (dfdate)
Product_ID Date
0 1004746 2021-11-13
1 1004746 2021-11-14
2 1004746 2021-11-15
3 1004746 2021-11-16
4 1004746 2021-11-17
... ...
3379 976460 2021-11-26
3380 976460 2021-11-27
3381 976460 2021-11-28
3382 976460 2021-11-29
3383 976460 2021-11-30
[3384 rows x 2 columns]
df = dfdate.merge(df, how='left').fillna({'restocking_events':0}, downcast='int')
print (df)
Product_ID Date restocking_events
0 1004746 2021-11-13 0
1 1004746 2021-11-14 0
2 1004746 2021-11-15 0
3 1004746 2021-11-16 1
4 1004746 2021-11-17 0
... ... ...
3379 976460 2021-11-26 1
3380 976460 2021-11-27 0
3381 976460 2021-11-28 0
3382 976460 2021-11-29 0
3383 976460 2021-11-30 0
[3384 rows x 3 columns]
或者如果需要每组连续的日期时间,请使用DataFrame.asfreq:
df2 = (df.set_index('Date')
.groupby('Product_ID')['restocking_events']
.apply(lambda x: x.asfreq('d', fill_value=0))
.reset_index())
print (df2)
Product_ID Date restocking_events
0 112714 2021-11-15 1
1 112714 2021-11-16 1
2 112714 2021-11-17 0
3 112714 2021-11-18 1
4 112714 2021-11-19 0
... ... ...
2209 3630918 2021-11-25 0
2210 3630918 2021-11-26 0
2211 3630918 2021-11-27 0
2212 3630918 2021-11-28 0
2213 3630918 2021-11-29 1
[2214 rows x 3 columns]