【问题标题】:Dividing time intervals with multiple index into hourly buckets in Python在Python中将具有多个索引的时间间隔划分为每小时桶
【发布时间】:2019-11-01 19:17:24
【问题描述】:

这是我拥有的示例数据集的代码

data={'ID':[4,4,4,4,22,22,23,25,29],
      'Zone':[32,34,21,34,27,29,32,75,9],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-01-2019 21:45','04-02-2019 00:23','04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID','Zone', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

使用这个数据集我正在尝试创建以下数据集

                Checked_in_hour    ID    Zone    checked_in_minutes
                01-04-2019 13:00    4    32        2
                01-04-2019 13:00    4    34        3
                01-04-2019 14:00    4    21        1
                01-04-2019 14:00    4    34        5
                01-04-2019 15:00    4    34        6
                01-04-2019 20:00    22    27       37
                01-04-2019 20:00    22    27       8
                01-04-2019 20:00    22    27       37
                01-04-2019 21:00    22    29       4
                01-04-2019 21:00    23    32       7
                01-04-2019 23:00    25    75       38
                02-04-2019 00:00    25    75       24
                02-04-2019 01:00    29    9        60
                02-04-2019 02:00    29    9        60
                02-04-2019 03:00    29    9        60
                02-04-2019 04:00    29    9        60
                02-04-2019 05:00    29    9        60
                02-04-2019 06:00    29    9        16

签到时间的计算方法是减去 checkin_datetime 和 checkout_datetime,时间按小时和区域分组

这是我目前在 Checked_in_hour 级别计算的代码,我需要添加到区域变量中

#working logic
df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})

【问题讨论】:

    标签: python pandas loops indexing


    【解决方案1】:

    不确定这是否有效,但应该可以。

    import pandas as pd
    from datetime import timedelta
    
    def group_into_hourly_buckets(df):
        df['duration'] = df['checkout_datetime'] - df['checkin_datetime']
        grouped_data = []
        for idx, row in df.iterrows():
            dur = row['duration'].seconds//60
            start_time = row['checkin_datetime']
            hours_ = 0
            while dur > 0:
                _data = {}
                _data['Checked_in_hour'] = start_time.floor('H') + timedelta(hours=hours_)
                time_spent_in_window = min(dur, 60)
                if (hours_ == 0):
                    time_spent_in_window = min(time_spent_in_window, ((start_time.ceil('H') - start_time).seconds)//60)
                _data['checked_in_minutes'] = time_spent_in_window
                _data['ID'] = row['ID']
                _data['Zone'] = row['Zone']
                dur -= time_spent_in_window
                hours_ += 1
                grouped_data.append(_data)
        return pd.DataFrame(grouped_data)
    

    【讨论】:

    • 另外,在示例数据集中,为什么有3行具有相同的Checked_in_hour、ID和Zone?
    • 这很好用,但我确实找到了一个循环不起作用的 sutvaiton ID Zone checkin_datetime checkout_datetime 14774 252 2019-06-01 00:02:43 2019-06-01 01:00:29 14774 252 2019-06-01 01:51:12 2019-06-01 03:16:26 14774 252 2019-06-01 03:21:11 2019-06-01 03:55:15 14774 216 2019-06- 01 03:55:15 2019-06-01 03:55:55 Checked_in_hour checked_in_minutes ID 区域 2019-06-01 01:00:00 8 14774 252 2019-06-01 02:00:00 60 14774 252 2019-06- 01 03:00:00 17 14774 252 2019-06-01 03:00:00 34 14774 252 在这种情况下,ID 216 在循环中被遗漏了
    • 复制数据时出错,道歉
    • @Ani 那是因为 ID 216Checked_in_duration 不到 1 分钟
    • 我也需要捕获它,那么如何将小数位添加到分钟?
    猜你喜欢
    • 1970-01-01
    • 2021-03-31
    • 1970-01-01
    • 2022-01-20
    • 1970-01-01
    • 1970-01-01
    • 2015-07-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多