【问题标题】:Boolean mask on timeseries with different frequencies不同频率时间序列的布尔掩码
【发布时间】:2020-11-09 20:15:23
【问题描述】:

我正在尝试使用每日频率的布尔数据框来掩盖一些频率为 6 小时的时间序列数据。结果应保持在 6 小时频率。下面的函数给了我我想要的,但它超级慢,我确信有更好的版本可以做到这一点。我是一个绝对的初学者...

非常感谢有关如何更好地做到这一点的任何帮助...

#create dataframes
df = pd.DataFrame({'a': [0,1,10,2,5,4,7,5],'b': [0,10,100,20,50,40,70,50], 'date':
                                       [pd.to_datetime('2017-04-01 00:00:00'),
                                        pd.to_datetime('2017-04-01 06:00:00'),
                                        pd.to_datetime('2017-04-01 12:00:00'),
                                        pd.to_datetime('2017-04-01 18:00:00'),
                                        pd.to_datetime('2017-04-02 00:00:00'),
                                        pd.to_datetime('2017-04-02 06:00:00'),
                                        pd.to_datetime('2017-04-02 12:00:00'),
                                        pd.to_datetime('2017-04-02 18:00:00')]})    
df = df.set_index('date')

bool_df = pd.DataFrame({'a': [True,False],'b': [False,True], 'date':
                                       [pd.to_datetime('2017-04-01'),
                                        pd.to_datetime('2017-04-02')]})  
bool_df = bool_df.set_index('date')
def mask_result(mask, dataframe):
    pd.set_option('use_inf_as_na', True)
    df_result=dataframe[:0]
    ratio_resamp=dataframe.resample('d')
    
    for name, ratio in ratio_resamp:
        day=name.strftime('%Y-%m-%d')
        boolmask=mask[mask.index==day].values
        df_result=df_result.append(ratio/boolmask)
        
    return df_result

mask_result(bool_df,df)

【问题讨论】:

    标签: python pandas dataframe time-series


    【解决方案1】:

    如果您将原始数据与掩码数据合并,您可以使用np.where 来应用掩码。

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({'a': [0,1,10,2,5,4,7,5],'b': [0,10,100,20,50,40,70,50], 'date':
                                           [pd.to_datetime('2017-04-01 00:00:00'),
                                            pd.to_datetime('2017-04-01 06:00:00'),
                                            pd.to_datetime('2017-04-01 12:00:00'),
                                            pd.to_datetime('2017-04-01 18:00:00'),
                                            pd.to_datetime('2017-04-02 00:00:00'),
                                            pd.to_datetime('2017-04-02 06:00:00'),
                                            pd.to_datetime('2017-04-02 12:00:00'),
                                            pd.to_datetime('2017-04-02 18:00:00')]}) 
    
    bool_df = pd.DataFrame({'a_mask': [True,False],'b_mask': [False,True], 'date_mask':
                                           [pd.to_datetime('2017-04-01'),
                                            pd.to_datetime('2017-04-02')]})  
    
    df = df.merge(bool_df, left_on=pd.to_datetime(df.date.dt.date), right_on='date_mask')
    df['a'] = np.where(df['a_mask'], df['a'], np.nan)
    df['b'] = np.where(df['b_mask'], df['b'], np.nan)
    
    df[['date', 'a','b']]
    

    输出

              date             a       b
    0   2017-04-01 00:00:00  0.0     NaN
    1   2017-04-01 06:00:00  1.0     NaN
    2   2017-04-01 12:00:00 10.0     NaN
    3   2017-04-01 18:00:00  2.0     NaN
    4   2017-04-02 00:00:00  NaN    50.0
    5   2017-04-02 06:00:00  NaN    40.0
    6   2017-04-02 12:00:00  NaN    70.0
    7   2017-04-02 18:00:00  NaN    50.0
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-09-07
      • 1970-01-01
      • 2020-10-13
      • 2014-11-28
      • 1970-01-01
      • 2015-10-21
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多