【问题标题】:Fecebook NeauralProphet - adding holidaysFacebook NeuralProphet - 添加假期
【发布时间】:2022-01-12 22:08:07
【问题描述】:

我有一个通用数据集用于我的预测,其中包括全球数据。

    ds                 y     country_id
01/01/2021 09:00:00   5.0       1
01/01/2021 09:10:00   5.2       1
01/01/2021 09:20:00   5.4       1
01/01/2021 09:30:00   6.1       1
01/01/2021 09:00:00   2.0       2
01/01/2021 09:10:00   2.2       2
01/01/2021 09:20:00   2.4       2
01/01/2021 09:30:00   3.1       2



    playoffs = pd.DataFrame({
      'holiday': 'playoff',
      'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
                            '2010-01-24', '2010-02-07', '2011-01-08',
                            '2013-01-12', '2014-01-12', '2014-01-19',
                            '2014-02-02', '2015-01-11', '2016-01-17',
                            '2016-01-24', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,
    })
    superbowls = pd.DataFrame({
      'holiday': 'superbowl',
      'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,

})
holidays = pd.concat((playoffs, superbowls))

现在,我想为模型添加假期。

m = NeuralProphet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
  1. 如何将多个国家/地区假期添加到 add_country_holidays (m.add_country_holidays)?
  2. 如何将特定国家/地区的假期添加到假期数据中?
  3. 我是否需要针对国家/地区生成不同的模型?或者,整个数据集的一个模型很好,然后将能够添加回归量。有什么建议?

【问题讨论】:

    标签: facebook-prophet prophet


    【解决方案1】:

    这是一个可能的解决方案:

    程序:

    # NOTE 1: tested on google colab
    
    # Un-comment the following (!pip) line if you need to install the libraries 
    # on google colab notebook:
    
    #!pip install neuralprophet pandas numpy holidays
    
    import pandas as pd
    import numpy as np
    import holidays
    from neuralprophet import NeuralProphet
    import datetime
    
    
    # NOTE 2: Most of the code comes from:
    # https://neuralprophet.com/html/events_holidays_peyton_manning.html
    
    # Context:
    # We will use the time series of the log daily page views for the Wikipedia
    # page for Peyton Manning (American former football quarterback ) as an example.
    # During playoffs and super bowls, the Peyton Manning's wiki page is more frequently
    # viewed. We would like to see if countries specific holidays also have an
    # influence. 
    
    # First, we load the data:
    
    data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
    df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")
    
    # To simulate your case, we add a country_id column filled with random values {1,2}
    # Let's assume US=1 and Canada=2
    
    import numpy as np
    np.random.seed(0)
    df['country_id']=np.random.randint(1,2+1,df['ds'].count())
    
    print("The dataframe we are working on:")
    print(df.head())
    
    
    # We would like to add holidays for US and Canada to see if holidays have an
    # influence on the # of daily's views on Manning's wiki page.
    
    # The data in df starts in 2007 and ends in 2016:
    StartingYear=2007
    LastYear=2016
    #  Holidays for US and Canada:
    US_holidays = holidays.US(years=[year for year in range(StartingYear, LastYear+1)])
    CA_holidays = holidays.CA(years=[year for year in range(StartingYear, LastYear+1)])
    
    holidays_US=pd.DataFrame()
    holidays_US['ds']=[]
    holidays_US['event']=[]
    holidays_CA=pd.DataFrame()
    holidays_CA['ds']=[]
    holidays_CA['event']=[]
    for i in df.index: 
        # Convert date string to datetime object:
        datetimeobj=[int(x) for x in df['ds'][i].split('-')] 
        # Check if the corresponding day is a holyday in the US;
        if  df['country_id'][i]==1 and (datetime.datetime(*datetimeobj) in US_holidays):
            d = {'ds': [df['ds'][i]], 'event': ['holiday_US']}
            df1=pd.DataFrame(data=d)
            # If yes: add to holidays_US
            holidays_US=holidays_US.append(df1,ignore_index=True)
            
        # Check if the corresponding day is a holyday in Canada:
        if  df['country_id'][i]==2 and (datetime.datetime(*datetimeobj) in CA_holidays):
            d = {'ds': [df['ds'][i]], 'event': ['holiday_CA']}
            df1=pd.DataFrame(data=d)
            # If yes: add to holidays_CA
            holidays_CA=holidays_CA.append(df1,ignore_index=True)
    
    # Now we can drop the country_id in df:
    df.drop('country_id', axis=1, inplace=True)
    
    
    print("Days in df that are holidays in the US:")
    print(holidays_US.head())
    print()
    print("Days in df that are holidays in Canada:")
    print(holidays_CA.head())
    
    
    # user specified events
    # history events
    playoffs = pd.DataFrame({
        'event': 'playoff',
        'ds': pd.to_datetime([
            '2008-01-13', '2009-01-03', '2010-01-16',
            '2010-01-24', '2010-02-07', '2011-01-08',
            '2013-01-12', '2014-01-12', '2014-01-19',
            '2014-02-02', '2015-01-11', '2016-01-17',
            '2016-01-24', '2016-02-07',
        ]),
    })
    
    superbowls = pd.DataFrame({
        'event': 'superbowl',
        'ds': pd.to_datetime([
            '2010-02-07', '2012-02-05', '2014-02-02', 
            '2016-02-07',
        ]),
    })
    
    
    # Create the events_df:
    events_df = pd.concat((playoffs, superbowls, holidays_US, holidays_CA))
    
    # Create neural network and fit:
    # NeuralProphet Object
    m = NeuralProphet(loss_func="MSE")
    m = m.add_events("playoff")
    m = m.add_events("superbowl")
    m = m.add_events("holiday_US")
    m = m.add_events("holiday_CA")
    
    
    # create the data df with events
    history_df = m.create_df_with_events(df, events_df)
    
    # fit the model
    metrics = m.fit(history_df, freq="D")
    
    # forecast with events known ahead
    future = m.make_future_dataframe(df=history_df, events_df=events_df, periods=365, n_historic_predictions=len(df))
    forecast = m.predict(df=future)
    
    
    fig = m.plot(forecast)
    fig_param = m.plot_parameters()
    fig_comp = m.plot_components(forecast)
    

    结果: 结果(参见 PARAMETERS 图)似乎表明,当一天是假期时,美国和加拿大的观看次数都较少。是否有意义?也许... 度假的人似乎有比浏览 Manning 的 wiki 页面更有趣的事情要做 :-) 我不知道。

    程序的输出:

    The dataframe we are working on:
               ds       y  country_id
    0  2007-12-10  9.5908           1
    1  2007-12-11  8.5196           2
    2  2007-12-12  8.1837           2
    3  2007-12-13  8.0725           1
    4  2007-12-14  7.8936           2
    Days in df that are holidays in the US:
               ds       event
    0  2007-12-25  holiday_US
    1  2008-01-21  holiday_US
    2  2008-07-04  holiday_US
    3  2008-11-27  holiday_US
    4  2008-12-25  holiday_US
    
    Days in df that are holidays in Canada:
               ds       event
    0  2008-01-01  holiday_CA
    1  2008-02-18  holiday_CA
    2  2008-08-04  holiday_CA
    3  2008-09-01  holiday_CA
    4  2008-10-13  holiday_CA
    
    INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
    INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
    INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 138
    
    88%
    241/273 [00:02<00:00, 121.69it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.51E+00
    
    88%
    241/273 [00:02<00:00, 123.87it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.63E+00
    
    89%
    242/273 [00:02<00:00, 121.58it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.62E-02, min: 2.58E+00
    INFO - (NP.forecaster._init_train_loader) - lr-range-test selected learning rate: 3.44E-02
    Epoch[138/138]: 100%|██████████| 138/138 [00:29<00:00,  4.74it/s, MSELoss=0.012, MAE=0.344, RMSE=0.478, RegLoss=0]
    

    数字:

    预测:

    参数:

    组件:

    【讨论】:

      猜你喜欢
      • 2022-01-12
      • 2022-01-12
      • 1970-01-01
      • 2012-09-13
      • 1970-01-01
      • 2021-05-06
      • 2021-03-12
      • 2019-02-28
      • 2022-12-21
      相关资源
      最近更新 更多