如何使用R中的复制功能来重复功能答案

【问题标题】：How to use the replicate function in R to repeat the function如何使用R中的复制功能来重复功能
【发布时间】：2020-06-18 03:39:01
【问题描述】：

我在使用replicate 重复该功能时遇到问题。

我尝试使用引导程序来适应一个二次模型，使用浓度作为预测变量，Total_lignin 作为响应，并报告最大值的估计值和相应的标准误差。

我的想法是创建一个名为 bootFun 的函数，它基本上在 for 循环的一次迭代中完成所有事情。 bootFun 只接受了预测变量的数据集和使用的响应（两个变量名都用引号括起来）。

但是，SD 为 0，不正确。不知道哪里错了。你能帮我解决一下吗？

# Load the libraries
library(dplyr)
library(tidyverse)

# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>% 
  filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>% 
  arrange(variety)
# Check the data
head(dat)

# sample size
n <- nrow(dat)

# A function to do one iteration
bootFun <- function(dat, pred, resp){
  # Draw the sample size from the dataset
  sample <- sample_n(dat, n, replace = TRUE)

  # A quadratic model fit
  formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
  fit <- lm(formula, data = sample)

  # Derive the max of the value of concentration
  max <- -fit$coefficients[2]/(2*fit$coefficients[3])

  return(max)
}

max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )

# Iterated times
N <- 5000

# Use 'replicate' function to do a loop
maxs <- replicate(N, max)

# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)

【问题讨论】：

欢迎您！请提供dput(dat)，不用说我们无法访问您的 .csv 文件。在此处阅读我们如何提出可重复的问题：stackoverflow.com/questions/5963269/…
您正在复制 single 函数调用的结果 N 次，而不是复制函数调用 N 次。
maxs <- replicate(N, bootFun (...))
看到这个SO post。

标签： r loops replicate

【解决方案1】：

基础包boot，函数boot，可以减轻重复调用引导函数的工作。第一个参数必须是数据集，第二个参数是索引参数，用户没有设置，其他参数也可以传递给它。在这种情况下，其他参数是预测变量和响应名称。

library(boot)

bootFun <- function(dat, indices, pred, resp){
  # Draw the sample size from the dataset
  dat.sample <- dat[indices, ]

  # A quadratic model fit
  formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
  formula <- as.formula(formula)
  fit <- lm(formula, data = dat.sample)

  # Derive the max of the value of concentration
  max <- -fit$coefficients[2]/(2*fit$coefficients[3])

  return(max)
}

N <- 5000
set.seed(1234)  # Make the bootstrap results reproducible

results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')

results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration", 
#    resp = "Total_lignin")
#
#
#Bootstrap Statistics :
#      original        bias    std. error
#t1* -0.4629808 -0.0004433889  0.03014259
#


results$t0  # this is the statistic, not bootstrapped
#concentration 
#   -0.4629808

mean(results$t)  # bootstrap value
#[1] -0.4633233

请注意，要拟合多项式，函数poly 比明确地逐项写下多项式要简单得多。

formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')

检查自举统计的分布。

op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)

测试数据

set.seed(2020)  # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)

【讨论】：