【发布时间】:2018-07-05 21:09:17
【问题描述】:
我已经建立了一种方法来创建一个纠错模型 (ECM),它是多个 ECM 的平均值。为此,我利用 R 中的 lm() 函数创建多个表示 ECM 的 lm 对象。我正在平均每个对象的系数以获得最终模型。 lm 对象表示 ECM 的方式是,在对数据运行 lm() 之前,我将数据转换为 ECM 所需的格式。
我还使用 AIC 进行反向选择来消除我不需要的变量。该过程在创建 ECM 时似乎运行良好。但是,当我创建一个列名与模型中的系数匹配的数据框时,我收到一条错误消息,指出数据中缺少必要的变量。然而,在最终模型中,这个变量没有被包括在内,所以不需要预测。那么为什么predict() 会寻找那个变量呢?我做错了什么?
#Load data
library(ecm)
data(Wilshire)
trn <- Wilshire[Wilshire$date<='2015-12-01',]
y <- trn$Wilshire5000
xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
#Function to split data into k partitions and build k models, each on a (k-1)/k subset of the data
avelm <- function(formula, data, k = 5, seed = 5, ...) {
lmall <- lm(formula, data, ...)
modellist <- 1:k
set.seed(seed)
models <- lapply(modellist, function(i) {
tstIdx <- sample(nrow(data), 1/k * nrow(data))
trn <- data[-tstIdx, ]
lm(as.formula(formula), data = trn)
})
lmnames <- names(lmall$coefficients)
lmall$coefficients <- rowMeans(as.data.frame(sapply(models, function(m) coef(m))))
names(lmall$coefficients) <- lmnames
lmall$fitted.values <- predict(lmall, data)
target <- trimws(gsub("~.*$", "", formula))
lmall$residuals <- data[, target] - lmall$fitted.values
return(lmall)
}
#Function to create an ECM using backwards selection based on AIC (leveraged avelm function above)
aveecmback <- function (y, xeq, xtr, k = 5, seed = 5, ...) {
xeqnames <- names(xeq)
xeqnames <- paste0(xeqnames, "Lag1")
xeq <- as.data.frame(xeq)
xeq <- rbind(rep(NA, ncol(xeq)), xeq[1:(nrow(xeq) - 1), ])
xtrnames <- names(xtr)
xtrnames <- paste0("delta", xtrnames)
xtr <- as.data.frame(xtr)
xtr <- data.frame(apply(xtr, 2, diff, 1))
yLag1 <- y[1:(length(y) - 1)]
x <- cbind(xtr, xeq[complete.cases(xeq), ])
x <- cbind(x, yLag1)
names(x) <- c(xtrnames, xeqnames, "yLag1")
x$dy <- diff(y, 1)
formula <- "dy ~ ."
model <- avelm(formula, data = x, k = k, seed = seed, ...)
fullAIC <- partialAIC <- AIC(model)
while (partialAIC <= fullAIC) {
todrop <- rownames(drop1(model))[-grep("none|yLag1", rownames(drop1(model)))][which.min(drop1(model)$AIC[-grep("none|yLag1", rownames(drop1(model)))])]
formula <- paste0(formula, " - ", todrop)
model <- avelm(formula, data = x, seed = seed, ...)
partialAIC <- AIC(model)
if (partialAIC < fullAIC & length(rownames(drop1(model))) > 2) {
fullAIC <- partialAIC
}
}
return(model)
}
finalmodel <- aveecmback(y, xeq, xtr)
print(finalmodel)
Call:
lm(formula = formula, data = data)
Coefficients:
(Intercept) deltaCorpProfits deltaUnempRate CorpProfitsLag1 yLag1
-0.177771 0.012733 -1.204489 0.002046 -0.024294
#Create data frame to predict on
set.seed(2)
df <- data.frame(deltaCorpProfits=rnorm(5), deltaUnempRate=rnorm(5), CorpProfitsLag1=rnorm(5), yLag1=rnorm(5))
predict(finalmodel, df)
Error in eval(predvars, data, env) : object 'deltaFedFundsRate' not found
【问题讨论】: