我将给出一个使用lm() 模型的解决方案,因为我没有安装roll 包。
首先,我假设您能够将数据帧存储在列表中。在这种情况下,您可以使用 lapply() 对要应用 fun1() 函数的每个数据框进行迭代。
因此,从概念上讲,我们将有两个由lapply() 调用的不同函数,一个在外层 上迭代数据帧,一个在内部级别迭代分析的数据帧的列。内部函数将是您的 fun1() 函数,但为了更清楚起见,我将其称为 fun_inner()。
因此,假设您有一个名为 list_of_dfs 的数据框列表,那么以下函数将允许您执行所需的操作(您只需调整 fun_inner() 函数以调用 roll_lm() 函数) :
#' Fits a predictive model using each column of a data frame as target variable
#'
#' @param df data frame with the data to fit.
#' @param target_columns name or index of the column(s) in data frame \code{df}
#' to be used as target variables, one at a time.
#' @param pred_columns name or index of the column(s) in data frame \code{df}
#' to be used as predictor variables.
#' @param fun_model function to be called that fits the model \code{y ~ X}
#' where \code{y} is each variable taken from the \code{target_columns} of the input data frame
#' and \code{X} is the set of predictor variables taken from the \code{pred_columns}
#' of the input data frame.
#'
#' @return a list containing the result of the model fit to each target variable.
fun_outer_iterate_on_dfs <- function(df, target_columns, pred_columns, fun_model) {
lapply(df[, target_columns, drop=FALSE], fun_model, as.matrix(df[, pred_columns, drop=FALSE]))
}
#' Fits an lm( y ~ X ) model. \code{y} is a numeric vector and \code{X} is a matrix.
fun_inner_fit_model <- function(y, X) {
lm( y ~ X )
}
这是一个用例示例:
set.seed(1717)
nobs = 10
# List containing the data frames to be used in the model fits
list_of_dfs[[1]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs))
list_of_dfs[[2]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs))
list_of_dfs[[3]] = data.frame(RunTime=rnorm(nobs), y1=rnorm(nobs), y2=rnorm(nobs))
test_models_on_each_df <- lapply(list_of_dfs, fun_outer_iterate_on_dfs, c("y1", "y2"),
"RunTime", fun_inner_fit_model)
注意我们如何将更多参数(除了第一个参数)传递给lapply() 调用的函数,只需将它们列在函数名之后(在本例中是在fun_outer_iterate_on_dfs 之后)。
传递给fun_outer_iterate_on_dfs() 函数的列名也可以是列索引。
上面的代码给出了类似的内容:
[[1]]
[[1]]$y1
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
-0.05994 -0.11727
[[1]]$y2
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
0.02854 -0.08574
[[2]]
[[2]]$y1
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
-0.23479 -0.01973
[[2]]$y2
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
0.07248 -0.33088
[[3]]
[[3]]$y1
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
-0.3087 -0.1191
[[3]]$y2
Call:
lm(formula = y ~ X)
Coefficients:
(Intercept) X
0.1765 0.5085
我们看到两个回归适合三个数据帧中的每一个,一个在目标变量 y1 上,一个在目标变量 y2 上。
最后,如果您已经将数据框存储为不同的对象,您可以使用以下代码 sn-p 将数据框存储在列表中,假设所有数据框名称都跟在后面模式Sum_* 并且工作区中定义的所有具有此模式的对象都是感兴趣的数据框:
#' Stores objects in the workspace whose name satisfies a given pattern in a list
#'
#' @param pattern regular expression to be satisfied by the name of the object to store in the list.
#' @return a named list whose elements are the objects found in the parent environment satisfying the given pattern.
#' The name of each element is the name of the object.
store_objects_in_list <- function(pattern) {
object_names = ls(pattern=pattern, envir=parent.frame())
list_with_objects = lapply(object_names, get)
names(list_with_objects) = object_names
return(list_with_objects)
}
list_of_dfs <- store_objects_in_list("^Sum_")