【问题标题】:How can I take a subsample having almost the same mean and standard deviation of the population?如何获取具有几乎相同的总体均值和标准差的子样本?
【发布时间】:2021-09-16 17:31:51
【问题描述】:

如果这是我的数据框:

> length <- rep(11:17, 200)
> mean(length)
[1] 14
> sd(length)
[1] 2.001

如何从数据帧(长度)中随机抽取子样本,但均值和标准差几乎相同?

【问题讨论】:

    标签: r mean resampling population


    【解决方案1】:

    您可以反复从长度中提取,直到找到满足您要求的足够样本。它不漂亮,但它有效。

    length <- rep(11:17, 200)
    
    # save mean and sd the subsamples should have
    aimed_mean <- mean(length)
    aimed_sd <- sd(length)
    
    # set number of replications / iterations
    n_replication <- 1000
    
    # set size of sample
    size_sample <- 40
    
    # set desired number of samples
    n_sample <- 3
    
    # set deviation from mean and sd you can accept
    deviation_mean <- 1.5
    deviation_sd <- 1.5
    
    # create empty container for resulting samples
    samples <- list(n_replication)
    
    # Repeatedly sample from length
    i <- 0
    sample_count <- 0
    
    repeat {
      
      i <- i+1
      
      # take a sample from length
      sample_length <- sample(length, size_sample)
      
      # keep the sample when is is close enough
      if(abs(aimed_mean - mean(sample_length)) < deviation_mean &
      abs(aimed_mean - mean(sample_length)) < deviation_sd){
        
        samples[[i]] <- sample_length
        sample_count <- sample_count + 1
        
      }
      
      if(i == n_replication | sample_count == n_sample){
        break
      }
      
    }
    
    # your samples
    samples
    
    # test whether it worked
    lapply(samples, function(x){abs(mean(x)-aimed_mean)<deviation_mean})
    lapply(samples, function(x){abs(sd(x)-aimed_sd)<deviation_sd})
    

    【讨论】:

    • 非常感谢@tc90kk,它的工作非常好。
    • 我很高兴它对你有用。由于我花了一些时间来编写代码,如果能接受您的问题的答案,那就太好了。 :-)
    猜你喜欢
    • 2021-10-14
    • 1970-01-01
    • 2015-12-16
    • 2020-05-19
    • 2019-01-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-09-04
    相关资源
    最近更新 更多