【发布时间】:2022-02-08 21:30:05
【问题描述】:
我有一个时间范围的 pandas 数据集,我想为每个日期计算 (FROM_TIME - TO_TIME) 和 (23:00 - 07:00) 范围之间的时间重叠(以分钟为单位)
DATE FROM_TIME TO_TIME
2015-01-01 2354 0408
2015-01-02 0200 0741
2015-01-03 1800 0811
2015-01-04 0015 0756
2015-01-05 0024 0259
例如,在第一个日期重叠为 254m(4h 和 14m),第二个为 300m(5h)。预期的输出是:
DATE FROM_TIME TO_TIME intersection
2015-01-01 2354 0408 254.0
2015-01-02 0200 0741 300.0
2015-01-03 1800 0811 480.0
2015-01-04 0015 0756 405.0
2015-01-05 0024 0259 155.0
我尝试了以下方法:
sample = {'Date': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '2015-01-05'],
'FROM_TIME':['2354', '0200', '1800', '0015', '0024'],
'TO_TIME':['0408', '0741', '0811', '0756', '0259']}
dftest = pd.DataFrame.from_dict(sample)
def get_intersection(x):
a=pd.to_datetime(x['FROM_TIME'],format='%H%M',errors='coerce')
b=pd.to_datetime(x['TO_TIME'],format='%H%M',errors='coerce')
c=pd.to_datetime("2300",format='%H%M')
d=pd.to_datetime("0700",format='%H%M')
latest_start = max(a, c)
earliest_end = min(b, d)
delta = pd.Timedelta(earliest_end - latest_start).seconds/60
overlap = max(0, delta)
return overlap
dftest['intersection']=dftest.apply(get_intersection, axis=1)
dftest
Date FROM_TIME TO_TIME intersection
2015-01-01 2354 0408 254.0
2015-01-02 0200 0741 480.0
2015-01-03 1800 0811 480.0
2015-01-04 0015 0756 480.0
2015-01-05 0024 0259 239.0
但是输出不正确。我知道这是因为 max 和 min 函数在某些情况下会返回错误的时间,但是如何在 python 中计算交集?
编辑
我把函数改成了
def get_intersection(x):
departure_time=pd.to_datetime(x['DEPARTURE_TIME'],format='%H%M',errors='coerce')
arrival_time=pd.to_datetime(x['ARRIVAL_TIME'],format='%H%M',errors='coerce')
upper_time=pd.to_datetime("2300",format='%H%M')
lower_time=pd.to_datetime("0700",format='%H%M')
if departure_time > arrival_time:
latest_start = max(departure_time, upper_time)
earliest_end = min(arrival_time, lower_time)
else:
if departure_time > lower_time:
latest_start = lower_time
earliest_end = lower_time
else:
latest_start = min(departure_time, upper_time)
earliest_end = min(arrival_time, lower_time)
delta = (earliest_end - latest_start).seconds/60
print(f'departure_time = {departure_time}, arrival_time = {arrival_time}\nlatest_start = {latest_start}, earliest_end ={earliest_end}, delta = {delta}')
overlap = max(0, delta)
return overlap
并且似乎可以计算出我想要的结果,尽管在我想要工作的数据集上非常慢,因为它有数百万或行。
【问题讨论】:
-
FROM_TIME和TO_TIME的值是字符串吗? -
是的,它们是字符串。
-
你能提供(正确的)预期输出吗?时间总是在同一天吗?您想要绝对差异还是可以是负数?
-
我提供了预期的输出。时间是同一天。我想要绝对的差异。
-
2300 和 0700 没有互换?
标签: python pandas time python-datetime