【发布时间】:2018-11-16 03:32:01
【问题描述】:
我有两个数据框,并希望通过时间戳来匹配它们。例如:
A
Time X
0 05-01-2017 09:08 3
1 05-01-2017 09:09 6
2 07-01-2017 09:09 5
3 07-01-2017 09:19 4
4 07-01-2017 09:19 8
5 07-02-2017 09:19 7
6 07-02-2017 09:19 5
B
Time Y
0 06-01-2017 14:45 1
1 04-01-2017 03:31 9
2 07-01-2017 03:31 4
3 07-01-2017 14:57 5
4 09-01-2017 14:57 7
有太多数据无法将 df_A 中的每个项目与 df_B 中的每个项目进行比较。相反,我想找到在受控时间阈值内的每个匹配项,例如 2 天。那就是:
dT = Time A – Time B
-2 < dT < 2
结果应该是:
C
Index A Time A X Index B Time B Y dT
0 05-01-2017 09:08 3 0 06-01-2017 14:45 1 -1.2
0 05-01-2017 09:08 3 1 04-01-2017 03:31 9 1.2
0 05-01-2017 09:08 3 2 07-01-2017 03:31 4 -1.8
1 05-01-2017 09:09 6 0 06-01-2017 14:45 1 -1.2
1 05-01-2017 09:09 6 1 04-01-2017 03:31 9 1.2
1 05-01-2017 09:09 6 2 07-01-2017 03:31 4 -1.8
2 07-01-2017 09:09 5 0 06-01-2017 14:45 1 0.8
2 07-01-2017 09:09 5 2 07-01-2017 03:31 4 0.2
2 07-01-2017 09:09 5 3 07-01-2017 14:57 5 -0.2
3 07-01-2017 09:19 4 0 06-01-2017 14:45 1 0.8
3 07-01-2017 09:19 4 2 07-01-2017 03:31 4 0.2
3 07-01-2017 09:19 4 3 07-01-2017 14:57 5 -0.2
4 07-01-2017 09:19 8 0 06-01-2017 14:45 1 0.8
4 07-01-2017 09:19 8 2 07-01-2017 03:31 4 0.2
4 07-01-2017 09:19 8 3 07-01-2017 14:57 5 -0.2
5 07-02-2017 09:19 7
6 07-02-2017 09:19 5
4 09-01-2017 14:57 7
我尝试了以下代码,但它不起作用:
import pandas as pd
import datetime as dt
from datetime import timedelta
# Data
df_A = pd.DataFrame({'X':[3,6,5,4,8,7,5], 'Time_A': [dt.datetime(2017,1,5,9,8), dt.datetime(2017,1,5,9,9), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,1,7,9,19), dt.datetime(2017,2,7,9,19), dt.datetime(2017,2,7,9,19)]})
df_B = pd.DataFrame({'Y':[1,9,4,5,7], 'Time_B': [dt.datetime(2017,1,6,14,45), dt.datetime(2017,1,4,3,31), dt.datetime(2017,1,7,3,31), dt.datetime(2017,1,7,14,57), dt.datetime(2017,1,9,14,57)]})
# Match
def slice_datetime(Time, window):
return (Time + timedelta(hours=window)).strftime('%Y-%m-%d %H:%m')
lst = []
for Time in df_A[['X', 'Time_A']].iterrows():
tmp = df_B.ix[slice_datetime(Time,-48):slice_datetime(Time,48)] # Define the time threshold (hours)
if not tmp.empty:
_match = pd.DataFrame()
for Time_A, (X, Y, Time_B) in tmp.iterrows():
lst.append([X, Y, Time_A, Time_B])
df_C = pd.DataFrame(lst, columns = ['X', 'Y', 'Time_A', 'Time_B'])
【问题讨论】:
标签: python dataframe match timedelta threshold