在Python中将具有多个索引的时间间隔划分为每小时桶答案

【问题标题】：Dividing time intervals with multiple index into hourly buckets in Python在Python中将具有多个索引的时间间隔划分为每小时桶
【发布时间】：2019-11-01 19:17:24
【问题描述】：

这是我拥有的示例数据集的代码

data={'ID':[4,4,4,4,22,22,23,25,29],
      'Zone':[32,34,21,34,27,29,32,75,9],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-01-2019 21:45','04-02-2019 00:23','04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID','Zone', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

使用这个数据集我正在尝试创建以下数据集

                Checked_in_hour    ID    Zone    checked_in_minutes
                01-04-2019 13:00    4    32        2
                01-04-2019 13:00    4    34        3
                01-04-2019 14:00    4    21        1
                01-04-2019 14:00    4    34        5
                01-04-2019 15:00    4    34        6
                01-04-2019 20:00    22    27       37
                01-04-2019 20:00    22    27       8
                01-04-2019 20:00    22    27       37
                01-04-2019 21:00    22    29       4
                01-04-2019 21:00    23    32       7
                01-04-2019 23:00    25    75       38
                02-04-2019 00:00    25    75       24
                02-04-2019 01:00    29    9        60
                02-04-2019 02:00    29    9        60
                02-04-2019 03:00    29    9        60
                02-04-2019 04:00    29    9        60
                02-04-2019 05:00    29    9        60
                02-04-2019 06:00    29    9        16

签到时间的计算方法是减去 checkin_datetime 和 checkout_datetime，时间按小时和区域分组

这是我目前在 Checked_in_hour 级别计算的代码，我需要添加到区域变量中

#working logic
df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})

【问题讨论】：

标签： python pandas loops indexing

【解决方案1】：

不确定这是否有效，但应该可以。

import pandas as pd
from datetime import timedelta

def group_into_hourly_buckets(df):
    df['duration'] = df['checkout_datetime'] - df['checkin_datetime']
    grouped_data = []
    for idx, row in df.iterrows():
        dur = row['duration'].seconds//60
        start_time = row['checkin_datetime']
        hours_ = 0
        while dur > 0:
            _data = {}
            _data['Checked_in_hour'] = start_time.floor('H') + timedelta(hours=hours_)
            time_spent_in_window = min(dur, 60)
            if (hours_ == 0):
                time_spent_in_window = min(time_spent_in_window, ((start_time.ceil('H') - start_time).seconds)//60)
            _data['checked_in_minutes'] = time_spent_in_window
            _data['ID'] = row['ID']
            _data['Zone'] = row['Zone']
            dur -= time_spent_in_window
            hours_ += 1
            grouped_data.append(_data)
    return pd.DataFrame(grouped_data)

【讨论】：

另外，在示例数据集中，为什么有3行具有相同的Checked_in_hour、ID和Zone？
这很好用，但我确实找到了一个循环不起作用的 sutvaiton ID Zone checkin_datetime checkout_datetime 14774 252 2019-06-01 00:02:43 2019-06-01 01:00:29 14774 252 2019-06-01 01:51:12 2019-06-01 03:16:26 14774 252 2019-06-01 03:21:11 2019-06-01 03:55:15 14774 216 2019-06- 01 03:55:15 2019-06-01 03:55:55 Checked_in_hour checked_in_minutes ID 区域 2019-06-01 01:00:00 8 14774 252 2019-06-01 02:00:00 60 14774 252 2019-06- 01 03:00:00 17 14774 252 2019-06-01 03:00:00 34 14774 252 在这种情况下，ID 216 在循环中被遗漏了
复制数据时出错，道歉
@Ani 那是因为 ID 216 的 Checked_in_duration 不到 1 分钟
我也需要捕获它，那么如何将小数位添加到分钟？