【发布时间】:2022-11-21 20:04:00
【问题描述】:
我有一个这样的数据框:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-02 -8.963545 2007-01-02 09:20:11.249998
2007-01-03 -8.671357 2007-01-03 10:39:31.874991
2007-01-03 -8.996480 2007-01-03 20:22:59.999006
2007-01-04 -8.835958 2007-01-04 10:18:56.249024
2007-01-05 -8.785034 2007-01-05 21:21:39.374002
... ...
2019-12-30 -8.529724 2019-12-30 20:23:01.874996
2019-12-30 -8.563781 2019-12-30 20:48:28.125016
2019-12-30 -8.504211 2019-12-30 21:23:44.999996
2019-12-31 -8.460620 2019-12-31 09:39:31.873999
2019-12-31 -8.230092 2019-12-31 10:18:58.125014
[7983 rows x 2 columns]
我想根据hour time计算每个早上和每个下午的平均值。早上我的意思是在10:00:00和22:00:00附近观察到下午的数据。如果这天早上/晚上没有值,就填np.nan。例如,在 2007-01-01 上,我们没有 sig 的任何早上或晚上值。然后我们用两个 np.nan 值填充它。然后在2007-01-02上我们只有早上的值,所以我们用np.nan填充2007-01-02的晚上值。
具体来说,对于 2019-12-30,我们有 3 个晚上值,分别是 2019-12-30 20:23:01.874996、2019-12-30 20:48:28.125016 和 2019-12-30 21:23:44.999996。所以我们需要计算-8.529724、-8.563781和-8.504211的平均值。 2019-12-31早上的最后两个数据点也是一样的,我们需要对它们进行平均,然后将np.nan填充到2019-12-31晚上。
所以理想情况下最终的结果是:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-01 nan 2007-01-01 10:00:00
2007-01-01 nan 2007-01-01 22:00:00
2007-01-02 -8.963545 2007-01-02 09:20:11.249998
2007-01-02 nan 2007-01-02 22:00:00
2007-01-03 -8.671357 2007-01-03 10:39:31.874991
2007-01-03 -8.996480 2007-01-03 20:22:59.999006
2007-01-04 -8.835958 2007-01-04 10:18:56.249024
2007-01-04 nan 2007-01-04 22:00:00
2007-01-05 nan 2007-01-05 10:00:00
2007-01-05 -8.785034 2007-01-05 21:21:39.374002
... ...
2019-12-30 -8.532572 2019-12-30 22:00:00
2019-12-31 -8.345356 2019-12-31 09:39:31.873999
2019-12-31 nan 2019-12-31 22:00:00
如果我们将所有 hourtime 舍入到 10:00:00 或 22:00:00 就可以了,如下所示:
gpi_data[['sig','hourtime']]
Out[28]:
sig hourtime
datetime_doy
2007-01-01 nan 2007-01-01 10:00:00
2007-01-01 nan 2007-01-01 22:00:00
2007-01-02 -8.963545 2007-01-02 10:00:00
2007-01-02 nan 2007-01-02 22:00:00
2007-01-03 -8.671357 2007-01-03 10:00:00
2007-01-03 -8.996480 2007-01-03 22:00:00
2007-01-04 -8.835958 2007-01-04 10:00:00
2007-01-04 nan 2007-01-04 22:00:00
2007-01-05 nan 2007-01-05 10:00:00
2007-01-05 -8.785034 2007-01-05 22:00:00
... ...
2019-12-30 -8.532572 2019-12-30 22:00:00
2019-12-31 -8.460620 2019-12-31 10:00:00
2019-12-31 nan 2019-12-31 22:00:00
我该怎么做?有没有人可以帮助我?谢谢!
【问题讨论】:
标签: python pandas dataframe group-by pandas-resample