根据列值从 data.frame 中随机选择行答案

【问题标题】：Randomly pick rows from a data.frame based on a column value根据列值从 data.frame 中随机选择行
【发布时间】：2021-10-28 21:10:51
【问题描述】：

我想知道是否有办法让我的 expand.grid() 输出为每个唯一的 study 值显示不相等的行（目前，每个唯一的 study 值有 4 行）？

例如，我们能否随机选择study == 1的部分或全部行、study == 2的部分或全部行以及study == 3的部分或全部行？

从每项研究中选取的行数是完全随机的。

这是一个玩具研究，非常感谢一个实用的答案。

library(dplyr)
(data <- expand.grid(study = 1:2, outcome = rep(1:2,2)))
arrange(data, study, outcome)
#   study outcome
#1      1       1
#2      1       1
#3      1       2
#4      1       2 #--- Up to here study == 1
#5      2       1
#6      2       1
#7      2       2
#8      2       2 #--- Up to here study == 2
#9      3       1
#10     3       1
#11     3       2
#12     3       2 #--- Up to here study == 3

【问题讨论】：

你如何决定每个study 的随机性？您能否向您期望的函数显示一些示例输入和相应的输出？
@RonakShah，这也是随机的。您可以选择所有行，也可以选择研究中现有的几行。

标签： r dataframe function random dplyr

【解决方案1】：

您可以为每个study sample n() 并选择 1 个随机值。

library(dplyr)

data %>% group_by(study) %>% sample_n(sample(n(), 1)) %>% ungroup

【讨论】：

看起来，sample_n() 家族的生命周期已被取代。我们可以改用slice_sample() 吗？
不。 slice_sample 需要修复 n 或 prop 并且它们不能跨组动态。去年我提出了一个问题github.com/tidyverse/dplyr/issues/5478 对此，但dplyr 开发人员对此有不同的看法。

【解决方案2】：

如果我理解这应该可行

data %>% 
  #Grouping by the variable study
  group_by(study) %>% 
  #Sampling 3 observations for each study
  sample_n(size = 3)

【讨论】：