【问题标题】:Pandas: calculate the morning averaged values or afternoon averaged valuesPandas:计算上午平均值或下午平均值
【发布时间】:2022-11-21 20:04:00
【问题描述】:

我有一个这样的数据框:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-02   -8.963545 2007-01-02 09:20:11.249998
2007-01-03   -8.671357 2007-01-03 10:39:31.874991
2007-01-03   -8.996480 2007-01-03 20:22:59.999006
2007-01-04   -8.835958 2007-01-04 10:18:56.249024
2007-01-05   -8.785034 2007-01-05 21:21:39.374002
               ...                        ...
2019-12-30   -8.529724 2019-12-30 20:23:01.874996
2019-12-30   -8.563781 2019-12-30 20:48:28.125016
2019-12-30   -8.504211 2019-12-30 21:23:44.999996
2019-12-31   -8.460620 2019-12-31 09:39:31.873999
2019-12-31   -8.230092 2019-12-31 10:18:58.125014

[7983 rows x 2 columns]

我想根据hour time计算每个早上和每个下午的平均值。早上我的意思是在10:00:0022:00:00附近观察到下午的数据。如果这天早上/晚上没有值,就填np.nan。例如,在 2007-01-01 上,我们没有 sig 的任何早上或晚上值。然后我们用两个 np.nan 值填充它。然后在2007-01-02上我们只有早上的值,所以我们用np.nan填充2007-01-02的晚上值。 具体来说,对于 2019-12-30,我们有 3 个晚上值,分别是 2019-12-30 20:23:01.8749962019-12-30 20:48:28.1250162019-12-30 21:23:44.999996。所以我们需要计算-8.529724-8.563781-8.504211的平均值。 2019-12-31早上的最后两个数据点也是一样的,我们需要对它们进行平均,然后将np.nan填充到2019-12-31晚上。

所以理想情况下最终的结果是:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-01   nan       2007-01-01 10:00:00
2007-01-01   nan       2007-01-01 22:00:00
2007-01-02   -8.963545 2007-01-02 09:20:11.249998
2007-01-02   nan       2007-01-02 22:00:00
2007-01-03   -8.671357 2007-01-03 10:39:31.874991
2007-01-03   -8.996480 2007-01-03 20:22:59.999006
2007-01-04   -8.835958 2007-01-04 10:18:56.249024
2007-01-04   nan       2007-01-04 22:00:00
2007-01-05   nan       2007-01-05 10:00:00
2007-01-05   -8.785034 2007-01-05 21:21:39.374002
               ...                        ...
2019-12-30   -8.532572 2019-12-30 22:00:00
2019-12-31   -8.345356 2019-12-31 09:39:31.873999
2019-12-31   nan       2019-12-31 22:00:00

如果我们将所有 hourtime 舍入到 10:00:00 或 22:00:00 就可以了,如下所示:

gpi_data[['sig','hourtime']]
Out[28]: 
                   sig                   hourtime
datetime_doy                                     
2007-01-01   nan       2007-01-01 10:00:00
2007-01-01   nan       2007-01-01 22:00:00
2007-01-02   -8.963545 2007-01-02 10:00:00
2007-01-02   nan       2007-01-02 22:00:00
2007-01-03   -8.671357 2007-01-03 10:00:00
2007-01-03   -8.996480 2007-01-03 22:00:00
2007-01-04   -8.835958 2007-01-04 10:00:00
2007-01-04   nan       2007-01-04 22:00:00
2007-01-05   nan       2007-01-05 10:00:00
2007-01-05   -8.785034 2007-01-05 22:00:00
               ...                        ...
2019-12-30   -8.532572 2019-12-30 22:00:00
2019-12-31   -8.460620 2019-12-31 10:00:00
2019-12-31   nan       2019-12-31 22:00:00

我该怎么做?有没有人可以帮助我?谢谢!

【问题讨论】:

    标签: python pandas dataframe group-by pandas-resample


    【解决方案1】:

    使用cut 定义1022 列的一些阈值,这里使用1223 小时。

    然后在MultiIndex.from_product中按最小和最大年份创建MultiIndex,聚合mean并按Series.reindex添加缺失的组合,最后创建hourtime列:

    df['hourtime'] = pd.cut(df['hourtime'].dt.hour, bins=[0,12,23], labels=[10,22])
    
    start = pd.Timestamp(year=df.index.year.min(), month=1, day=1)
    end = pd.Timestamp(year=df.index.year.max(), month=12, day=31)
    mux = pd.MultiIndex.from_product([pd.date_range(start, end), [10,22]],
                                      names=['datetime_doy','h'])
    
    df = df.groupby([df.index, 'hourtime'])['sig'].mean().reindex(mux).reset_index(level=1)
    df['hourtime'] = df.index + pd.to_timedelta(df.pop('h'), unit='H')
    print (df)
                       sig            hourtime
    datetime_doy                              
    2007-01-01         NaN 2007-01-01 10:00:00
    2007-01-01         NaN 2007-01-01 22:00:00
    2007-01-02   -8.963545 2007-01-02 10:00:00
    2007-01-02         NaN 2007-01-02 22:00:00
    2007-01-03   -8.671357 2007-01-03 10:00:00
                   ...                 ...
    2019-12-29         NaN 2019-12-29 22:00:00
    2019-12-30         NaN 2019-12-30 10:00:00
    2019-12-30   -8.532572 2019-12-30 22:00:00
    2019-12-31   -8.345356 2019-12-31 10:00:00
    2019-12-31         NaN 2019-12-31 22:00:00
    
    [9496 rows x 2 columns]
    

    【讨论】:

      猜你喜欢
      • 2013-01-14
      • 2020-03-10
      • 1970-01-01
      • 2012-06-19
      • 1970-01-01
      • 2017-02-06
      • 1970-01-01
      相关资源
      最近更新 更多