将 predict() 的结果放在列表内的 for 循环中答案

【问题标题】：Place results of predict() in a for loop inside a list将 predict() 的结果放在列表内的 for 循环中
【发布时间】：2018-12-27 23:29:13
【问题描述】：

假设我想在不同的样本上多次运行 mtcars 数据集上的线性回归模型。这个想法是，对于 for 循环中的每次迭代，每次运行线性回归时都存储 predict() 方法的结果对于不同的样本。运行一次的小例子如下：

## Perform model once on a Sample and use model on full dataset:
Sample_Size <- 10
Sample <- mtcars[sample(nrow(mtcars), Sample_Size), ]
 Model <- lm(formula = mpg ~ wt, data = Sample)
 Predictions <- predict(Model,newdata=mtcars)
 ## Gets us a list with predicted wt for each car:
Predictions <- t(Predictions)

这会产生

> Predictions
     Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
[1,]  25.80494      23.89161   28.05592       21.34051          19.65228
       Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
 [1,] 19.50221   18.67685  21.52809 21.82822 19.65228  19.65228   14.92523
     Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
 [1,]   17.47633    17.10117           6.071394            4.765828

 .... and so on for other cars

我想在 for 循环中多次执行此过程，每次选择不同的样本并获得对应的 Predictions() 列表，并将所有 Predictions() 结果逐行存储在数据框中。

假设我为两个不同的样本运行模型。结果数据帧的每一行都应该是该样本的上述结果，例如：

     Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout
 [1,]  25.80494      23.89161   28.05592       21.34051          19.65228
 [2,]  22.80492      22.89147   28.05532       21.34231          20.65290
       Valiant Duster 360 Merc 240D Merc 230 Merc 280 Merc 280C Merc 450SE
 [1,] 19.50221   18.67685  21.52809 21.82822 19.65228  19.65228   14.92523
 [2,] 21.83492   23.84147  29.02532 21.34231 20.35290  18.45228   13.92523

 ... and so on for other cars.

关于如何进行此操作的任何想法？我已经开发了一些东西，但它要么引发错误或仅存储最后一个结果...我在这里缺少什么？

这是我目前所拥有的：

### Inside a for loop, to get a dataframe of Predictions:

Bootstrap_times <- 2
Sample_Size <- 10
Predictions <- list()
Results <-vector ("list",Bootstrap_times)## Stores the Predictions for each run

for(i in 1:Bootstrap_times){
### Take a sample
Sample[[i]] <- mtcars[sample(nrow(mtcars), Sample_Size), ]
### Do the regression on the sample
Model[[i]] <- lm(formula = mpg ~ wt, data = Sample[[i]])
### Perform the predict() on the sample
Predictions[[i]] <- predict(Model[[i]],newdata=mtcars)
### put the result as a line on the dataframe Results
Predictions[[i]] <- t(Predictions[[i]])
return(Predictions)
}

但是，我不断收到：

[[<-.data.frame(*tmp*, i, value = list(mpg = c(13.3, 10.4, : 替换有10行，数据有0

【问题讨论】：

谢谢，我意识到缺少一些东西，即 Sample_Size 参数。现在将添加它，干杯

标签： r for-loop linear-regression

【解决方案1】：

我更喜欢使用magic_for()，但是你也可以很容易地使用基础 R 来做到这一点。

这是一个例子：

Bootstrap_times <- 2
Sample_Size     <- 10

Sample      <- mtcars[sample(nrow(mtcars), Sample_Size), ]
Model       <- lm(formula = mpg ~ wt, data = Sample)
Predictions <- predict(Model,newdata=mtcars)
## You like how I line up arrows, right?
Predictions <- t(Predictions)


Predictions <- list()
Results     <-vector ("list",Bootstrap_times)## Stores the Predictions for each run

magicfor::magic_for()
for(i in 1:Bootstrap_times){
  ### Take a sample
  Sample[[i]] <- mtcars[sample(nrow(mtcars), Sample_Size), ]
  ### Do the regression on the sample
  Model[[i]] <- lm(formula = mpg ~ wt, data = Sample[[i]])
  ### Perform the predict() on the sample

  put(predict(Model[[i]],newdata=mtcars))
}

tmp<-magicfor::magic_result_as_dataframe()

tmp

   i predict(Model[[i]],newdata=mtcars)
1  1                          22.858806
2  2                          20.922763
3  1                          25.136504
4  2                          18.341372
5  1                          16.633098
6  2                          16.481252
7  1                          15.646096
8  2                          18.531180
9  1                          18.834873
10 2                          16.633098
11 1                          16.633098
12 2                          11.849933
13 1                          14.431324
14 2                          14.051708
15 1                           2.890988
16 2                           1.569924
17 1                           2.169717
18 2                          26.047583
19 1                          30.489093
20 2                          28.818782
21 1                          24.035616
22 2                          16.025712
23 1                          16.671060
24 2                          13.596168
25 1                          13.558206
26 2                          28.059549
27 1                          26.503122
28 2                          31.263511
29 1                          18.683026
30 2                          21.719957
31 1                          15.646096
32 2                          21.644034
33 1                          22.978374
34 2                          21.584264
35 1                          24.618503
36 2                          19.725450
37 1                          18.495353
38 2                          18.386011
39 1                          17.784630
40 2                          19.862128
41 1                          20.080812
42 2                          18.495353
43 1                          18.495353
44 2                          15.051081
45 1                          16.909894
46 2                          16.636540
47 1                           8.599905
48 2                           7.648629
49 1                           8.080530
50 2                          25.274555
51 1                          28.472808
52 2                          27.270046
53 1                          23.825774
54 2                          18.057985
55 1                          18.522689
56 2                          16.308514
57 1                          16.281178
58 2                          26.723336
59 1                          25.602581
60 2                          29.030452
61 1                          19.971470
62 2                          22.158309
63 1                          17.784630
64 2                          22.103638

【讨论】：

谢谢！我似乎在使用函数 magic_for() 时遇到了问题...您确定您的语法正确吗？
@Arrebimbomalho 是的，我确定。你能告诉我你跑了什么和错误吗？你是先安装的吧？我猜你需要用install.packages() 安装它。另见cran.r-project.org/web/packages/magicfor/index.html
糟糕，抱歉，第一次在 cmets 中粘贴代码。我猜是生活和学习 :) 干杯
现在开始工作了！ 8级！我把你的代码直接贴上去就行了。不要急于求成，但你知道有什么巧妙的方法可以将每次运行的预测放在汽车名称下方吗？ :) 非常感谢！
@Arrebimbomalho 你可以在put 中放任何你想要的东西。喜欢put(prediction = predict(..pseudocode...), carnames = names(...), etc, etc。 NP。干杯，祝你好运请记得点击绿色复选标记:)

【解决方案2】：

我的版本：

# load data
data(mtcars)
N <- nrow(mtcars)

# bootstrap parameters
sample_size <- 10
bootstrap_times <- 20

# create empty storage matrix of results
# one row per bootstrap sample, one column per predicted weight
res_mat <- matrix(NA, nrow=bootstrap_times, ncol=N)
colnames(res_mat) <- rownames(mtcars)

# do bootstrap
for (i in seq(bootstrap_times)) {
    this_sample <- sample(N, sample_size, replace=FALSE)
    reg_result  <- lm(mpg ~ wt, data=mtcars[this_sample,])
    res_mat[i,] <- predict(reg_result, mtcars)
}

【讨论】：

谢谢。我尝试了你的解决方案，但它似乎只用 NA 填充矩阵，由于某种原因它没有对值进行更新
我想我知道它是什么...可能在 for_cycle 内的 reg_result 中缺少索引。但是如何在外面声明呢？
我刚刚将上面的代码复制粘贴到一个新的 R 会话中，我得到了预期的结果——对不起，不知道该告诉你什么。

【解决方案3】：

这是使用嵌套 data.frames 的tidyverse 方法：

library(tidyverse)

Bootstrap_times <- 2
Sample_Size <- 10

Predictions <- data.frame(SampleID = 1:Bootstrap_times) %>%
  group_by(SampleID) %>%
  nest() %>%
  mutate(data = data %>% map(~mtcars[sample(nrow(mtcars), Sample_Size), ]),
         Model = data %>% map(~lm(formula = mpg ~ wt, data = .)),
         Predictions = map2(Model, data, ~predict(.x, newdata = .y))) %>%
  select(SampleID, Predictions) %>%
  unnest()

结果：

# A tibble: 20 x 2
   SampleID Predictions
      <int>       <dbl>
 1        1        22.7
 2        1        16.2
 3        1        19.7
 4        1        21.5
 5        1        18.7
 6        1        17.4
 7        1        23.3
 8        1        10.7
 9        1        18.8
10        1        19.8
11        2        11.4
12        2        19.6
13        2        11.7
14        2        18.1
15        2        21.1
16        2        18.6
17        2        16.2
18        2        23.5
19        2        19.7
20        2        20.7

这种方法的优点是很容易从模型中提取其他信息（使用broom）并组合为一个单独的data.frame输出：

library(broom)

data.frame(SampleID = 1:Bootstrap_times) %>%
  group_by(SampleID) %>%
  nest() %>%
  mutate(data = data %>% map(~mtcars[sample(nrow(mtcars), Sample_Size), ]),
         Model = data %>% map(~lm(formula = mpg ~ wt, data = .) %>% augment())) %>%
  select(-data) %>%
  unnest()

结果：

# A tibble: 20 x 11
   SampleID .rownames            mpg    wt .fitted .se.fit .resid  .hat .sigma  .cooksd .std.resid
      <int> <chr>              <dbl> <dbl>   <dbl>   <dbl>  <dbl> <dbl>  <dbl>    <dbl>      <dbl>
 1        1 Dodge Challenger    15.5  3.52   17.2    0.689 -1.72  0.106   2.15 0.0442      -0.862 
 2        1 Datsun 710          22.8  2.32   23.5    0.940 -0.655 0.198   2.24 0.0148      -0.346 
 3        1 Cadillac Fleetwood  10.4  5.25    8.24   1.52   2.16  0.515   1.93 1.15         1.47  
 4        1 Merc 450SE          16.4  4.07   14.4    0.863  2.04  0.167   2.10 0.112        1.06  
 5        1 Ford Pantera L      15.8  3.17   19.0    0.672 -3.24  0.101   1.85 0.147       -1.62  
 6        1 Lotus Europa        30.4  1.51   27.6    1.39   2.75  0.432   1.79 1.14         1.73  
 7        1 Volvo 142E          21.4  2.78   21.1    0.751  0.334 0.126   2.26 0.00207      0.169 
 8        1 Merc 280C           17.8  3.44   17.6    0.678  0.163 0.103   2.26 0.000378     0.0812
 9        1 Mazda RX4 Wag       21    2.88   20.6    0.724  0.428 0.117   2.25 0.00308      0.215 
10        1 Camaro Z28          13.3  3.84   15.6    0.773 -2.26  0.134   2.06 0.102       -1.15  
11        2 Merc 280            19.2  3.44   19.7    1.09  -0.470 0.108   3.53 0.00138     -0.151 
12        2 Toyota Corolla      33.9  1.84   28.2    1.65   5.66  0.251   2.52 0.658        1.98  
13        2 Hornet Sportabout   18.7  3.44   19.7    1.09  -0.970 0.108   3.51 0.00588     -0.311 
14        2 Mazda RX4 Wag       21    2.88   22.7    1.07  -1.69  0.106   3.47 0.0173      -0.540 
15        2 Chrysler Imperial   14.7  5.34    9.50   2.42   5.20  0.539   2.02 3.15         2.32  
16        2 Camaro Z28          13.3  3.84   17.5    1.26  -4.23  0.145   3.08 0.163       -1.39  
17        2 Valiant             18.1  3.46   19.6    1.09  -1.46  0.110   3.48 0.0136      -0.469 
18        2 Porsche 914-2       26    2.14   26.6    1.43  -0.611 0.188   3.52 0.00490     -0.205 
19        2 Merc 280C           17.8  3.44   19.7    1.09  -1.87  0.108   3.45 0.0219      -0.600 
20        2 Lotus Europa        30.4  1.51   30.0    1.91   0.441 0.335   3.52 0.00677      0.164

注意：

使用此方法，您甚至不需要预测步骤（除非您使用新数据），因为您拥有来自 augment 的 .fitted 值。

第一个和第二个输出的预测不同，因为没有设置种子。

【讨论】：

谢谢！非常深入的回答，我很感激。请把它留在里面，以便我以后可以使用它:)