【问题标题】:Use pandas groupby to fetch frequency count based on time intervals使用 pandas groupby 根据时间间隔获取频率计数
【发布时间】:2021-04-01 21:05:13
【问题描述】:

我有一个如下所示的数据框

df = pd.DataFrame({'subject_id':[1,1,1,2,2,2],
              'start_time':['2130-03-25 18:51:47','2130-04-23 18:51:47','2130-04-23 18:51:47','2120-01-11 18:51:47','2120-01-11 18:51:47','2120-04-28 18:51:47'],
              'test_time':['2130-03-26 14:51:47','2130-04-24 18:51:47','2130-04-25 18:51:47','2121-02-26 18:51:47','2121-02-26 18:51:47','2120-04-28 19:51:47'],
              'test':['test1','test2','test2','test2','test3','test3']})
df['start_time'] = pd.to_datetime(df['start_time'])
df['test_time'] = pd.to_datetime(df['test_time'])

我想做的是

a) 从start_time 获取每 24 小时为每个主题完成的测试次数。测试时间可以从test_time栏目中查到

示例 - 24 小时,我的意思是 0-24hours24-48hours48-72hours 等。

我尝试了以下

df['time_diff'] = (df.test_time - df.start_time) / pd.Timedelta(hours=1)
conditions = [
    (df['time_diff'] >= 0) & (df['time_diff'] <= 24),
    (df['time_diff'] >24 ) & (df['time_diff'] <= 48),
    (df['time_diff'] > 48) & (df['time_diff'] <= 72)]
choices = ['0-24hrs','24-48hrs','48-72hrs']
df['op'] = np.select(conditions, choices, default='Greater than 3 days')
df.groupby(['subject_id','test','op'])['test'].count()

但是上面的输出格式不正确。

我希望我的输出如下所示

【问题讨论】:

    标签: python pandas dataframe numpy pandas-groupby


    【解决方案1】:

    你可以添加unstack

    out = df.groupby(['subject_id','test','op'])['test'].count().unstack(fill_value=0).reset_index()
    out
    op  subject_id   test  0-24hrs  24-48hrs  Greater than 3 days
    0            1  test1        1         0                    0
    1            1  test2        1         1                    0
    2            2  test2        0         0                    1
    3            2  test3        1         0                    1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-02-07
      • 1970-01-01
      • 2019-07-05
      • 2023-04-10
      • 2019-01-13
      • 2019-01-25
      • 1970-01-01
      • 2020-10-14
      相关资源
      最近更新 更多