从 dplyr 中均匀分布的随机样本生成的相同值答案

【问题标题】：Identical values generated from random samples from a uniform distribution in dplyr从 dplyr 中均匀分布的随机样本生成的相同值
【发布时间】：2020-06-04 09:07:26
【问题描述】：

这是对上一个问题的跟进。我的问题没有完全阐述，因此在我的上一篇文章中没有完全回答。原谅我，我是使用堆栈溢出的新手。

我的教授分配了一个问题集，我们需要使用 dplyr 和其他 tidyverse 包。我非常清楚，我尝试执行的大多数（如果不是全部）任务都可以在 base r 中执行，但这与我的指示不一致。

首先，我们被要求从均匀分布中生成 1000 个随机样本：

2a.  Create a new tibble called uniformDf containing a variable called unifSamples that contains 10000 random samples from a uniform distribution.  You should use the runif() function to create the uniform samples. {r 2a}

uniformDf <- tibble(unifSamples = runif(1000))

一切顺利。

然后我们被要求循环这个 tibble 1000 次，每次选择 20 个随机样本并计算平均值并将其保存到一个 tibble：

2c.  Now let's loop through 1000 times, sampling 20 values from a uniform distribution and computing the mean of the sample, saving this mean to a variable called sampMean within a tibble called uniformSampleMeans. {r 2c}

unif_sample_size = 20 # sample size
n_samples = 1000 # number of samples

# set up q data frame to contain the results
uniformSampleMeans <- tibble(sampMean=rep(NA,n_samples))

# loop through all samples.  for each one, take a new random sample, 
# compute the mean, and store it in the data frame

for (i in 1:n_samples){
  uniformSampleMeans$sampMean[i] <- uniformDf %>%
  sample_n(unif_sample_size) %>%
  summarize(sampMean = mean(sampMean))
}

这一切都在运行，好吧，我相信直到我查看我的 uniformSampleMeans tibble。看起来像这样：

1   0.471271611726843
2   0.471271611726843
3   0.471271611726843
4   0.471271611726843
5   0.471271611726843
6   0.471271611726843
7   0.471271611726843
...
1000    0.471271611726843

所有值都是相同的！有没有人知道为什么我的输出是这样的？如果它们以 +/- 0.000x 的值变化，我不会那么担心，因为这是从 0 到 1 的分布，但即使到小数点后 15 位，这些值也都是相同的！非常感谢任何帮助！

【问题讨论】：

你有sampMean = mean(sampMean)。您尚未显示在何处创建 sampMean 对象，但它看起来像是在 for 循环之外生成的固定值。应该是sampMean = mean(unifSamples)。
好的，谢谢！我现在意识到这是多么愚蠢的错误。

标签： r for-loop dplyr tidyverse uniform-distribution

【解决方案1】：

以下选择随机的unif_sample_size 行并给出它的mean

library(dplyr)
uniformDf %>% sample_n(unif_sample_size) %>% pull(unifSamples) %>% mean
#[1] 0.5563638

如果你想这样做n次使用replicate并重复它n次

n <- 10
replicate(n, uniformDf %>%
               sample_n(unif_sample_size) %>%
               pull(unifSamples) %>% mean)
#[1] 0.5070833 0.5259541 0.5617969 0.4695862 0.5030998 0.5745950 0.4688153 0.4914363 0.4449804 0.5202964

【讨论】：