【问题标题】:ARIMA / SARIMAX forcasting unusual valuesARIMA / SARIMAX 预测异常值
【发布时间】:2020-04-18 04:40:24
【问题描述】:

这些是 30 天内每小时获取的一系列值,我将它们按每小时的一组收集,如下所示:

{'date':
['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
'hora0':[111666.5,121672.91666666667,87669.33333333333,89035.58333333333,91707.91666666667,94449.33333333333,103476.91666666667,123271.5,133306.58333333334,103149.91666666667,106310.25,91830.25,77733.75,96823.25,102880.25,118383.33333333333,95076.66666666667,93561.83333333333,97651.58333333333,112180.0,118051.75,135456.0,149553.0,125797.25,126098.0,128603.75,84631.08333333333,85683.16666666667,96377.16666666667,113161.16666666667],
'hora2':[83768.83333333333,83319.58333333333,72922.75,71893.75,73933.0,76598.83333333333,81021.75,93588.83333333333,94514.08333333333,87147.66666666667,91464.08333333333,74022.41666666667,63709.166666666664,75939.33333333333,79904.16666666667,84435.33333333333,76736.0,85237.33333333333,79162.75,91729.58333333333,99081.58333333333,106440.41666666667,112064.66666666667,111635.58333333333,110168.58333333333,111241.25,62634.083333333336,68203.33333333333,71515.16666666667,80674.66666666667]}

系列有类似的分布:

AIC 值是 Akaike 信息准则,它将预测模型相互比较。用于测试不同 ARIMA 模型并计算一系列 ARIMA 模型以查看哪个 AIC 值最低的代码

def AIC_iteration_i(train):
filterwarnings("ignore")
#X = df2.values
history = [x for x in train.iloc[:,0]]
p = d = q = range(0,6)
pdq = list(product(p,d,q))
aic_results = []
parameter = []
for param in pdq:
try:
model = ARIMA(history, order=param)
results = model.fit(disp=0)
# You can print each (p,d,q) parameters uncommented line below 
#print('ARIMA{} - AIC:{}'.format(param, results.aic))
aic_results.append(results.aic)
parameter.append(param)
except:
continue
d = dict(ARIMA=parameter, AIC=aic_results)
results_table = pd.DataFrame(dict([ (k, pd.Series(v)) for k,v in d.items()]))
# AIC minimum value
order = results_table.loc[results_table['AIC'].idxmin()][0]
return order

它为每个系列的最低 AIC 值的 (p,d,q) 参数返回相同的顺序 (0, 2, 1)

我用下面的代码得到它的预测,但结果在第 2 小时不行

# time series hora0.iloc[:,0] and hora1.iloc[:,0] from pandas df
trained = list(hora0.iloc[:,0])

# order got it above (0,2,1)
orders = order 

size = math.ceil(len(trained)*.8)
train, test = [trained[i] for i in range(size)] , [trained[i] for i in range(size,len(trained))]
predictions = []
predictionslower = []
predictionsupper = []
for k in range(len(test)):
model = ARIMA(trained, order=orders)
model_fit = model.fit(disp=0)
forecast, stderr, conf_int = model_fit.forecast()
yhat = forecast[0]
yhatlower = conf_int[0][0]
yhatupper = conf_int[0][1]
predictions.append(yhat)
predictionslower.append(yhatlower)
predictionsupper.append(yhatupper)
obs = test[k]
trained.append(obs)
#error = mean_squared_error(test, predictions)
predictions

预测

hour0 [113815.15072419723,128600.77967037176,131580.85654685542,83200.24743417211,83167.65192576911,95062.06180437957]`
prediction for `hour1 [79564.70753715932,112491.2694928094,114410.34654966182,60882.18766484651,nan,nan]

我还用pmd-arima 检查了系列 2 的 AIC,哪个顺序与 SARIMAX 型号的值相同。请给我点光。

【问题讨论】:

    标签: python-3.x time-series regression arima pyramid-arima


    【解决方案1】:

    hour2(也包括其他小时)中数据的值在时间序列中是非平稳的,为了消除非平稳,我们可以对原始数据应用微分或自然对数:

    hora2 = np.log('hora2')
    
    {'date':['2019-11-09','2019-11-10','2019-11-11','2019-11-12','2019-11-13','2019-11-14','2019-11-15','2019-11-16','2019-11-17','2019-11-18','2019-11-19','2019-11-20','2019-11-21','2019-11-22','2019-11-23','2019-11-24','2019-11-25','2019-11-26','2019-11-27','2019-11-28','2019-11-29','2019-11-30','2019-12-01','2019-12-02','2019-12-03','2019-12-04','2019-12-05','2019-12-06','2019-12-07','2019-12-08'],
    'hora2':[11.3358163,11.33043889,11.19715594,11.18294461,11.21091456,11.24633712,11.30247292,11.44666635,11.45650413,11.37535928,11.42370164,11.21212325,11.06208373,11.23769005,11.28858328,11.34374123,11.24812624,11.3531948,11.27926114,11.42660022,11.50369886,11.57534064,11.62683136,11.62299513,11.60976705,11.61945655,11.04506487,11.13024872,11.17766483,11.29817989]}
    

    一旦获得每个“horaX”系列的最小AIC值(Akaike信息标准)模型ARIMA(trained, order=orders)的订单。有些系列在预测中仍然返回NaN 值,我不得不取第二个或第三个最小的 AIC 值,返回预测结果,应用指数对数恢复原始值。

    {'hora2':[11.6948938,12.00191037,11.81401922,11.77476296,11.83965601,11.89443423]}
    
    hora2 = np.exp('hora2')
    
    {'hora2':[119957.62142129,163066.00981609,135133.60347713,129931.53854787,138642.78415756,146449.24980086]}
    

    测试数据的预测结果如图所示:

    【讨论】:

      猜你喜欢
      • 2020-08-28
      • 2018-08-20
      • 1970-01-01
      • 1970-01-01
      • 2019-07-17
      • 1970-01-01
      • 1970-01-01
      • 2016-02-13
      • 2020-08-26
      相关资源
      最近更新 更多