R - 如何在回归模型中旋转/交换预测变量（非逐步方法）答案

【问题标题】：R - How to rotate/interchange predictors in a regression model (a not-stepwise approach)R - 如何在回归模型中旋转/交换预测变量（非逐步方法）
【发布时间】：2020-01-04 22:16:44
【问题描述】：

给定的数据集具有一系列预测变量，应在简单（尽管是多变量）回归模型中一一使用。我无法掌握是否需要循环通过（名称）预测变量或类似lapply() 的东西就足够了。

创建一个函数在创建输出之前需要一个参数，但我不知道如何在给定的模型公式中加入 for 循环。

Some data
---
df <- data.frame(y1=runif(100,1,10),
    y2=runif(100,1,10),
    x1= runif(100,1,5),
    x2= runif(100,1,5), 
    x3= runif(100,1,5))

Y = cbind( df$y1 , df$y2 )

我有一种感觉是这样的：

list_pred <- for ( x in 1:colnames(pred)) {
  print(paste(x))
}

但是for 循环并不想继续工作。所以这让我觉得我可能必须创建一个包含lm() 参数的函数。

not_stepwise <- matrix( 0 , predictor , 1 ) # pre-allocation?
for (x in 1:predictor) {
 lm.dd <- lm( Y ~ [x] , data = df ] )
}

但此时我不知道该去哪里寻找，Google 和 StackOverflow 对此都有一些广泛的信息（除了统计含义，但我已经涵盖了）。

更新：为了澄清，我正在寻找模型本身（和/或 sig. 预测变量）的 R² 值的概述，以确定该模型是否甚至具有重要的预测变量，例如一个有意义的模型。

更新 2：我的数据集的外观（没有 DV）

'data.frame':   100 obs. of  35 variables:
 $ Minuten             : int  72 30 102 212 37 57 120 146 143 189 ...
 $ Teamsize            : int  3 3 4 3 2 4 5 6 5 3 ...
 $ Exp                 : num  6.67 6.67 5.5 5.33 10.5 ...
 $ Chirurg1            : int  10 10 1 2 4 2 3 3 2 9 ...
 $ Chirurg2            : int  11 11 2 NA NA NA NA NA 9 2 ...
 $ NG                  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ NG.Ratio            : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Specialisme         : int  2 2 1 3 1 3 1 1 3 3 ...
 $ Observaties         : int  43 21 55 132 22 90 90 64 100 129 ...
 $ UniqueCom           : int  9 6 9 12 4 9 9 12 16 12 ...
 $ G.Ratio             : num  0.333 0.667 0.25 0.667 0.5 ...
 $ Bewustwording       : int  1 0 1 1 0 0 0 0 1 1 ...
 $ Confrontatie        : int  0 1 0 2 0 0 0 1 1 2 ...
 $ Confrontatie.Outside: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Coordinerend        : int  1 3 6 17 2 4 10 6 14 9 ...
 $ Delegerend          : int  6 3 2 22 0 9 6 1 15 11 ...
 $ Goedaardig          : int  3 0 5 4 0 7 3 2 9 1 ...
 $ Grappig             : int  0 1 0 0 0 2 0 1 1 1 ...
 $ Hofmaken            : int  0 0 0 0 0 1 1 2 1 0 ...
 $ Instruerend         : int  9 0 7 13 0 7 3 9 7 13 ...
 $ Onderwijzend        : int  6 5 3 21 9 2 14 5 8 22 ...
 $ Ontbindend          : int  1 1 0 0 1 0 1 1 2 1 ...
 $ Protest             : int  0 0 0 0 0 0 0 0 1 0 ...
 $ Reactief            : int  0 0 0 0 0 0 0 0 1 0 ...
 $ Respons.Negatief    : int  0 0 1 1 0 0 1 1 0 0 ...
 $ Respons.Neutraal    : int  0 0 0 0 0 0 0 0 0 2 ...
 $ Respons.Positief    : int  1 0 1 2 1 1 0 1 2 8 ...
 $ Sign.out            : int  1 0 1 1 0 1 0 1 1 0 ...
 $ Time.out            : int  0 0 0 1 0 0 0 0 0 0 ...
 $ Volgzaam            : int  0 0 0 0 0 0 0 0 1 0 ...
 $ Vragend             : int  0 0 0 3 0 0 1 0 1 1 ...
 $ rank_sum            : int  27 11 24 80 12 33 37 25 58 65 ...
 $ rank_sum.60s        : num  0.375 0.367 0.235 0.377 0.324 ...
 $ ranking             : int  43 56 46 11 55 37 35 45 21 17 ...
 $ ranking.60s         : int  30 34 72 29 49 1 58 92 21 41 ...

【问题讨论】：

标签： r regression data-modeling

【解决方案1】：

第一个简单的解决方案

# Generate a dataset
X <- data.frame(matrix(runif(1000), ncol=20))
y <- rnorm(nrow(X))
dts <- data.frame(y, X)

lms <- vector(ncol(X), mode="list")
k <- 1
for (x in names(X)) {
   # Create formula with the k-th x variabile
   frml <- as.formula(paste0("y ~", x))
   # Use the formula in a linear model
   lms[[k]] <- lm(frml, data=dts)
   k <- k+1
}
# This is the output of the linear model with the 15-th x variable
summary(lms[[15]])
# A matrix with R-squared and adjusted R-squared
r2 <- function(x) c(summary(x)$r.squared, summary(x)$adj.r.squared)
t(sapply(lms, r2))

更优雅灵活的解决方案

R2 <- function(x, data) {
     frml <- as.formula(paste0("y ~", paste(unlist(x), collapse="+"))) 
     lmfit <- lm(frml, data=data)
     lmsum <- summary(lmfit)
     data.frame(R2=lmsum$r.squared, adj.R2=lmsum$adj.r.squared)
}
R2 <- Vectorize(R2, "x")

# The R-squared for all the univariate models
R2(names(X), dts)

# The R-squared for all the bivariate models 
k <- 2   
xcouples <- apply(combn(names(X), k), 2, list)
names(xcouples) <- lapply(xcouples, function(x) paste(unlist(x), collapse="_"))
t(R2(xcouples, dts))

【讨论】：

像魅力一样工作。只是我忘记了我需要检查 R² 值，有没有办法通过 for 循环创建一个包含模型和 R² 的列表？
感谢您的回复，在您上面的大多数示例中，矩阵返回NULL。 summary(lm_dd$r.squared[[15]]) 也返回 NULL，而没有 $r.squared 它工作得很好。你的第二个例子给出了'data' must be a data.frame, not a matrix or an array。您的 Xcorresponds 与我的一个子集没有 DV，我更改了一些名称以使其在我的脚本中工作但无济于事。
但它仍然应该是一个 data.frame，您的模拟数据示例非常完美
@fleems 正确的语法是summary(lm_dd)$r.squared。
@fleems 请与dput(df) 分享您的完整数据集的输出，例如使用 pastebin.com

【解决方案2】：

要找到所有可能的回归输出，包括数据集中的多个组合，下面的代码可能会有所帮助。

# To find all combinations of the predictors.

predictors <- names(df)[-1]
all_comb <- sapply(seq(predictors) ,function(i) {t(combn(predictors,i))})


# Calculating the regression outputs and putting into a list called result.

result <- list()

    for(x in 1:length(all_comb)){

        for(i in 1:nrow(all_comb[[x]])) {

            name <- paste(all_comb[[x]][i,], collapse = '_')
            group <- paste0("Y ~ ",paste0(all_comb[[x]][i,],collapse =" + "))
            result[[name]] <- lm(group, data =df )          

        }

     }

打电话给result，

...
  ...

$x1_x3

Call:
lm(formula = group, data = df)

Coefficients:
(Intercept)           x1           x3  
     6.6647      -0.3864      -0.0954  


$x2_x3

Call:
lm(formula = group, data = df)

Coefficients:
(Intercept)           x2           x3  
     5.3037       0.1438      -0.1459  


$x1_x2_x3

Call:
lm(formula = group, data = df)

Coefficients:
(Intercept)           x1           x2           x3  
    6.16101     -0.39160      0.15794     -0.07796

数据：

df <- data.frame(Y=runif(100,1,10),
    x1= runif(100,1,5),
    x2= runif(100,1,5), 
    x3= runif(100,1,5))

【讨论】：

我并不是在寻找所有组合，但从数据驱动的角度来看，它可能会很有趣。我们总是可以建立一个新的实验。您是否有关于如何在所有可能组合的框架中创建一个显示模型及其 R² 值的列表的解决方案？
@fleems 而不是result[[name]] <- lm(group, data =df ) 使用result[[name]] <- summary(lm(group, data =df ))$r.squared
感谢您的回复，在运行您的功能（非常干净）时，我确实收到了Error in model.frame.default(formula = group, data = df, drop.unused.levels = TRUE) : variable lengths differ (found for 'x1')。我只是将子集（您的 df）更改为我的子集，即 data.frame。我删除了 [-1]，因为我的 Y 不在第一列并且运行 all_comb <- sapply(seq(predictors) ,function(i) {t(combn(predictors,i))}) 不会继续，好像我缺少括号或其他东西
因变量Y在此部分赋值：group <- paste0("Y ~ ",paste0(all_comb[[x]][i,],collapse =" + "))。因此，请查看您的数据是否包含 Y 作为因变量。当然，您可以根据需要更改它。你得到的错误基本上是说行数不一样。所以不知何故 nrow(Y) 和 nrow(x1) ， nrow(x2) ...不匹配！