【发布时间】:2014-04-28 16:57:39
【问题描述】:
**根据提供的答案更新代码** 实施的解决方案没有对原始数据框进行子集化。
In [1]: thresh_eval.head()
Out[1]:
WDIR WSPD GDR GST GTIME
TX_DTTM
2010-01-01 05:50:00 235 10.9 238 13.4 540
2010-01-02 00:20:00 329 10.6 NaN NaN NaN
2010-01-02 00:30:00 329 10.8 NaN NaN NaN
2010-01-02 00:40:00 329 12.1 NaN NaN NaN
2010-01-02 00:50:00 332 12.2 330 14.8 46
In [2]: len(thresh_eval)
Out[2]: 5503
In [3]: unique(thresh_eval.index.date)
Out[3]:
array([datetime.date(2010, 1, 1), datetime.date(2010, 1, 2),
datetime.date(2010, 1, 3), datetime.date(2010, 1, 4),
datetime.date(2010, 1, 6), datetime.date(2010, 1, 8),
datetime.date(2010, 1, 9), datetime.date(2010, 1, 12),
datetime.date(2010, 1, 16), datetime.date(2010, 1, 17),
datetime.date(2010, 1, 18), datetime.date(2010, 1, 21),
datetime.date(2010, 1, 22), datetime.date(2010, 1, 23),
datetime.date(2010, 1, 24), datetime.date(2010, 1, 25),
datetime.date(2010, 1, 26), datetime.date(2010, 1, 27),
datetime.date(2010, 1, 29), datetime.date(2010, 1, 30),
datetime.date(2010, 1, 31), datetime.date(2010, 2, 1),
datetime.date(2010, 2, 2), datetime.date(2010, 2, 3),
datetime.date(2010, 2, 4), datetime.date(2010, 2, 5),
datetime.date(2010, 2, 6), datetime.date(2010, 2, 7),
datetime.date(2010, 2, 9), datetime.date(2010, 2, 10),
datetime.date(2010, 2, 11), datetime.date(2010, 2, 12),
datetime.date(2010, 2, 13), datetime.date(2010, 2, 14),
datetime.date(2010, 2, 15), datetime.date(2010, 2, 16),
datetime.date(2010, 2, 17), datetime.date(2010, 2, 18),
datetime.date(2010, 2, 22), datetime.date(2010, 2, 25),
datetime.date(2010, 2, 26), datetime.date(2010, 2, 27),
datetime.date(2010, 2, 28), datetime.date(2010, 3, 2),
datetime.date(2010, 3, 3), datetime.date(2010, 3, 12),
datetime.date(2010, 3, 13), datetime.date(2010, 3, 14),
datetime.date(2010, 3, 15), datetime.date(2010, 3, 18),
datetime.date(2010, 3, 21), datetime.date(2010, 3, 22),
datetime.date(2010, 3, 23), datetime.date(2010, 3, 26),
datetime.date(2010, 3, 27), datetime.date(2010, 3, 28),
datetime.date(2010, 3, 29), datetime.date(2010, 3, 30),
datetime.date(2010, 4, 9), datetime.date(2010, 4, 17),
datetime.date(2010, 4, 18), datetime.date(2010, 4, 25),
datetime.date(2010, 4, 26), datetime.date(2010, 4, 27),
datetime.date(2010, 4, 28), datetime.date(2010, 5, 3),
datetime.date(2010, 5, 8), datetime.date(2010, 5, 9),
datetime.date(2010, 5, 17), datetime.date(2010, 5, 24),
datetime.date(2010, 5, 25), datetime.date(2010, 5, 26),
datetime.date(2010, 6, 2), datetime.date(2010, 6, 3),
datetime.date(2010, 6, 6), datetime.date(2010, 6, 7),
datetime.date(2010, 6, 16), datetime.date(2010, 6, 28),
datetime.date(2010, 7, 2), datetime.date(2010, 7, 3),
datetime.date(2010, 7, 10), datetime.date(2010, 7, 16),
datetime.date(2010, 7, 22), datetime.date(2010, 7, 26),
datetime.date(2010, 7, 28), datetime.date(2010, 7, 30),
datetime.date(2010, 8, 1), datetime.date(2010, 8, 7),
datetime.date(2010, 8, 23), datetime.date(2010, 8, 24),
datetime.date(2010, 9, 2), datetime.date(2010, 9, 12),
datetime.date(2010, 9, 27), datetime.date(2010, 9, 29),
datetime.date(2010, 9, 30), datetime.date(2010, 10, 2),
datetime.date(2010, 10, 3), datetime.date(2010, 10, 15),
datetime.date(2010, 10, 16), datetime.date(2010, 10, 25),
datetime.date(2010, 10, 26), datetime.date(2010, 10, 27),
datetime.date(2010, 10, 29), datetime.date(2010, 11, 2),
datetime.date(2010, 11, 3), datetime.date(2010, 11, 4),
datetime.date(2010, 11, 5), datetime.date(2010, 11, 6),
datetime.date(2010, 11, 7), datetime.date(2010, 11, 9),
datetime.date(2010, 11, 12), datetime.date(2010, 11, 16),
datetime.date(2010, 11, 17), datetime.date(2010, 11, 26),
datetime.date(2010, 11, 27), datetime.date(2010, 11, 28),
datetime.date(2010, 11, 29), datetime.date(2010, 11, 30),
datetime.date(2010, 12, 1), datetime.date(2010, 12, 2),
datetime.date(2010, 12, 4), datetime.date(2010, 12, 5),
datetime.date(2010, 12, 6), datetime.date(2010, 12, 7),
datetime.date(2010, 12, 11), datetime.date(2010, 12, 12),
datetime.date(2010, 12, 13), datetime.date(2010, 12, 14),
datetime.date(2010, 12, 16), datetime.date(2010, 12, 17),
datetime.date(2010, 12, 18), datetime.date(2010, 12, 19),
datetime.date(2010, 12, 20), datetime.date(2010, 12, 22),
datetime.date(2010, 12, 23), datetime.date(2010, 12, 24),
datetime.date(2010, 12, 26), datetime.date(2010, 12, 27),
datetime.date(2010, 12, 28)], dtype=object)
In [4]: ais.head()
Out[4]:
MMSI LAT LON COURSE_OVER_GROUND NAV_STATUS POS_ACCURACY RATE_OF_TURN SPEED_OVER_GROUND HEADING
TX_DTTM
2010-01-01 00:00:19 12345678 32.834746 -79.929589 1820 0 0 128 71 NaN
2010-01-01 00:00:29 12345678 32.834384 -79.929602 1832 0 0 128 71 NaN
2010-01-01 00:00:40 12345678 32.834058 -79.929619 1836 0 0 128 70 NaN
2010-01-01 00:00:50 12345678 32.833703 -79.929647 1847 0 0 128 70 NaN
2010-01-01 00:01:00 12345678 32.833386 -79.929689 1897 0 0 128 69 NaN
In [5]: unique(ais.index.date)
Out[5]:
array([datetime.date(2010, 1, 1), datetime.date(2010, 1, 4),
datetime.date(2010, 1, 5), datetime.date(2010, 1, 6),
datetime.date(2010, 1, 7), datetime.date(2010, 1, 8),
datetime.date(2010, 1, 9), datetime.date(2010, 1, 10),
datetime.date(2010, 1, 11), datetime.date(2010, 1, 12),
datetime.date(2010, 1, 13), datetime.date(2010, 1, 14),
datetime.date(2010, 1, 15), datetime.date(2010, 1, 16),
datetime.date(2010, 1, 17), datetime.date(2010, 1, 18),
datetime.date(2010, 1, 19), datetime.date(2010, 1, 20),
datetime.date(2010, 1, 21), datetime.date(2010, 1, 22),
datetime.date(2010, 1, 23), datetime.date(2010, 1, 24),
datetime.date(2010, 1, 25), datetime.date(2010, 1, 26),
datetime.date(2010, 1, 27), datetime.date(2010, 1, 28),
datetime.date(2010, 1, 29), datetime.date(2010, 1, 30),
datetime.date(2010, 1, 31), datetime.date(2010, 2, 1)], dtype=object)
In [6]: len(ais)
Out[6]: 2750499
In [7]: ais[Index(ais.index.date).isin(Index(thresh_eval.index.date))]
Out[7]:
MMSI LAT LON COURSE_OVER_GROUND NAV_STATUS POS_ACCURACY RATE_OF_TURN SPEED_OVER_GROUND HEADING
TX_DTTM
2010-01-01 00:00:19 12345678 32.834746 -79.929589 1820 0 0 128 71 NaN
2010-01-01 00:00:29 12345678 32.834384 -79.929602 1832 0 0 128 71 NaN
2010-01-01 00:00:40 12345678 32.834058 -79.929619 1836 0 0 128 70 NaN
2010-01-01 00:00:50 12345678 32.833703 -79.929647 1847 0 0 128 70 NaN
2010-01-01 00:01:00 12345678 32.833386 -79.929689 1897 0 0 128 69 NaN
2010-01-01 00:01:06 12345678 32.833106 -79.929757 1934 0 0 128 69 NaN
2010-01-01 00:01:16 12345678 32.832830 -79.929850 1978 0 0 128 69 NaN
2010-01-01 00:01:26 12345678 32.832495 -79.929990 2010 0 0 128 69 NaN
In [8]: len(ais)
Out[8]: 2750499
In [9]: unique(ais.index.date)
Out[9]:
array([datetime.date(2010, 1, 1), datetime.date(2010, 1, 4),
datetime.date(2010, 1, 5), datetime.date(2010, 1, 6),
datetime.date(2010, 1, 7), datetime.date(2010, 1, 8),
datetime.date(2010, 1, 9), datetime.date(2010, 1, 10),
datetime.date(2010, 1, 11), datetime.date(2010, 1, 12),
datetime.date(2010, 1, 13), datetime.date(2010, 1, 14),
datetime.date(2010, 1, 15), datetime.date(2010, 1, 16),
datetime.date(2010, 1, 17), datetime.date(2010, 1, 18),
datetime.date(2010, 1, 19), datetime.date(2010, 1, 20),
datetime.date(2010, 1, 21), datetime.date(2010, 1, 22),
datetime.date(2010, 1, 23), datetime.date(2010, 1, 24),
datetime.date(2010, 1, 25), datetime.date(2010, 1, 26),
datetime.date(2010, 1, 27), datetime.date(2010, 1, 28),
datetime.date(2010, 1, 29), datetime.date(2010, 1, 30),
datetime.date(2010, 1, 31), datetime.date(2010, 2, 1)], dtype=object)
**原问题:** 我正在尝试根据数据帧的日期时间索引与另一个数据帧的日期时间索引之间的比较来对数据帧进行子集化。 df1 是用作过滤器的下采样时间序列的数据框。 df2是待过滤记录的dataframe,具有较高的时间分辨率,df1中每个日期出现多条记录:
In [1]: df1
Out[1]:
WSPD cd
date
2010-07-10 11.325645 0.000019
2010-08-23 12.258462 0.000019
2010-11-09 10.771429 0.000019
2010-11-12 10.650000 0.000019
2010-11-16 11.939535 0.000019
...
In [2]: df2
Out[2]:
ID Latitude Longitude Course RateOfTurn
TimeStamp
2010-06-26 22:36:11 311425000 32.832500 -79.929000 3 0
2010-06-26 22:36:21 311425000 32.832845 -79.929037 3 0
2010-06-26 22:36:32 311425000 32.833333 -79.929000 3 0
2010-06-26 22:36:42 311425000 32.833666 -79.929000 3 0
2010-07-10 07:37:21 548723000 32.832333 -79.929000 1.0 0
2010-07-10 07:37:31 548723000 32.832666 -79.929000 1.0 0
2010-07-10 07:37:40 548723000 32.833000 -79.929000 2.0 0
2010-07-10 07:37:51 548723000 32.833333 -79.929000 1.0 0
2010-07-10 07:38:04 548723000 32.833666 -79.929000 0.0 0
2010-08-23 09:29:48 311425000 32.832590 -79.928985 0.0 0
2010-08-23 09:30:00 311425000 32.833053 -79.928970 1.0 0
2010-08-23 09:30:10 311425000 32.833443 -79.928957 1.0 0
2010-08-23 09:30:18 311425000 32.833746 -79.928944 2.0 0
...
In [3]: list = []
for i,item in enumerate(df2.index.date):
if item in df1.index.date:
list.append(item)
In [4]: list
out[4]: [datetime.date(2010, 8, 23),
datetime.date(2010, 8, 23),
datetime.date(2010, 8, 23),
datetime.date(2010, 8, 23),
datetime.date(2010, 7, 10),
datetime.date(2010, 7, 10),
datetime.date(2010, 7, 10),
datetime.date(2010, 7, 10),
datetime.date(2010, 7, 10)]
我正在丢失索引之外的内容。我真的很想要来自 df2 的记录子集,包括所有数据,其日期时间与 df1 在日频率上匹配,例如:
2010-07-10 07:37:21 548723000 32.832333 -79.929000 1.0 0
2010-07-10 07:37:31 548723000 32.832666 -79.929000 1.0 0
2010-07-10 07:37:40 548723000 32.833000 -79.929000 2.0 0
2010-07-10 07:37:51 548723000 32.833333 -79.929000 1.0 0
2010-07-10 07:38:04 548723000 32.833666 -79.929000 0.0 0
2010-08-23 09:29:48 311425000 32.832590 -79.928985 0.0 0
2010-08-23 09:30:00 311425000 32.833053 -79.928970 1.0 0
2010-08-23 09:30:10 311425000 32.833443 -79.928957 1.0 0
2010-08-23 09:30:18 311425000 32.833746 -79.928944 2.0 0
任何帮助将不胜感激!
【问题讨论】:
标签: python pandas time-series subset