【发布时间】:2021-04-01 21:05:13
【问题描述】:
我有一个如下所示的数据框
df = pd.DataFrame({'subject_id':[1,1,1,2,2,2],
'start_time':['2130-03-25 18:51:47','2130-04-23 18:51:47','2130-04-23 18:51:47','2120-01-11 18:51:47','2120-01-11 18:51:47','2120-04-28 18:51:47'],
'test_time':['2130-03-26 14:51:47','2130-04-24 18:51:47','2130-04-25 18:51:47','2121-02-26 18:51:47','2121-02-26 18:51:47','2120-04-28 19:51:47'],
'test':['test1','test2','test2','test2','test3','test3']})
df['start_time'] = pd.to_datetime(df['start_time'])
df['test_time'] = pd.to_datetime(df['test_time'])
我想做的是
a) 从start_time 获取每 24 小时为每个主题完成的测试次数。测试时间可以从test_time栏目中查到
示例 - 24 小时,我的意思是 0-24hours、24-48hours、48-72hours 等。
我尝试了以下
df['time_diff'] = (df.test_time - df.start_time) / pd.Timedelta(hours=1)
conditions = [
(df['time_diff'] >= 0) & (df['time_diff'] <= 24),
(df['time_diff'] >24 ) & (df['time_diff'] <= 48),
(df['time_diff'] > 48) & (df['time_diff'] <= 72)]
choices = ['0-24hrs','24-48hrs','48-72hrs']
df['op'] = np.select(conditions, choices, default='Greater than 3 days')
df.groupby(['subject_id','test','op'])['test'].count()
但是上面的输出格式不正确。
我希望我的输出如下所示
【问题讨论】:
标签: python pandas dataframe numpy pandas-groupby