lm 用于预测 R 中因变量的值答案

【问题标题】：lm for predicting the values of the dependent variable in Rlm 用于预测 R 中因变量的值
【发布时间】：2018-01-26 18:31:20
【问题描述】：

我有一个包含随机数量的定量变量的数据框。我需要编写一个函数来计算 lm 以预测因变量的值。作为预测变量，我只想使用那些 p.value > 0.05 的变量。该函数应返回仅为选定预测变量构建的线性回归系数作为向量。如果数据中没有这样的预测变量，则函数应返回警告“数据中没有正常变量”。我写了函数，但它不起作用。

smart_lm <-  function(x) {
  sl <- apply(x[2:dim(x)[2]], 2, function(x) shapiro.test(x)$p.value)
  my_reg <- lm(as.formula(paste("x[[1]]~",paste(x[2:dim(x)[2]], collapse = "+"))))
  return(ifelse(sl[sl > 0.05], my_reg, "There are no normal variables in the data"))
}

【问题讨论】：

寻求帮助时，您应该包含一个简单的reproducible example，其中包含可用于测试和验证可能解决方案的示例输入和所需输出
提示：lm 肯定比ifelse 或if/else 贵。你不能颠倒你的函数中的逻辑吗？

标签： r apply lm

【解决方案1】：

如果我理解正确，以下应该做到这一点。
请注意，不需要ifelse，公式使用点. 来包含公式中尚未包含的数据参数的所有变量，除了DF2[[1]]。

set.seed(1665)    # Make the results reproducible
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n, 2, 6)
x3 <- rexp(n)
y <- x1 + x2 + x3 + rnorm(n)

dat <- data.frame(y, x1, x2, x3)

smart_lm <- function(DF){
    sl <- c(NA, sapply(DF[-1], function(x) shapiro.test(x)$p.value))
    DF2 <- DF[, which(sl > 0.05)]
    names(DF2) <- names(DF)[which(sl > 0.05)]
    lm(DF2[[1]] ~ ., data = DF2)
}

smart_lm(dat)
#
#Call:
#lm(formula = DF2[[1]] ~ ., data = DF2)
#
#Coefficients:
#(Intercept)           x1           x2  
# -3.331e-17    1.000e+00    9.593e-18

【讨论】：