提取变量中具有唯一值的行的所有可能组合答案

【问题标题】：Extract all possible combinations of rows with unique values in a variable提取变量中具有唯一值的行的所有可能组合
【发布时间】：2023-02-01 00:36:18
【问题描述】：

我正在尝试对一个数据集进行荟萃分析，其中多个作者进行了多项研究，这可能会导致偏差。因此，我想提取所有可能的行组合，其中任何作者出现一次。

样本数据：

sample <- data.frame(Author = c('a','a','b','b','c'),
                     Year = c('2020','2016', '2020','2010','2005'),
                     Value = c(3,1,2,4,5),
                     UniqueName = c('a 2020', 'a 2016', 'b 2020', 'b 2010', 'c 2005'))

Sample:

  Author Year Value UniqueName
1      a 2020     3     a 2020
2      a 2016     1     a 2016
3      b 2020     2     b 2020
4      b 2010     4     b 2010
5      c 2005     5     c 2005

并希望提取所有可能的行组合（在本例中为 4 种可能性），其中每个作者出现一次。

> output1
  Author Year Value UniqueName
1      a 2020     3     a 2020
2      b 2020     2     b 2020
3      c 2005     5     c 2005


> output2
  Author Year Value UniqueName
1      a 2016     1     a 2016
2      b 2020     2     b 2020
3      c 2005     5     c 2005


> output3
  Author Year Value UniqueName
1      a 2016     1     a 2016
2      b 2010     4     b 2010
3      c 2005     5     c 2005


> output4
  Author Year Value UniqueName
1      a 2020     3     a 2020
2      b 2010     4     b 2010
3      c 2005     5     c 2005

最后，我将对这 4 个不同的提取数据帧进行分析，但我不知道如何以较少的手动方式获取它们。

【问题讨论】：

标签： r dataframe dplyr tidyverse metafor

【解决方案1】：

也许存在一种不那么棘手的方法，但我似乎有一个可行的解决方案。

我的想法是将您的数据框拆分为作者，并使用 expand.grid 强制组合唯一行。然后用 lapply 创建一个带有行索引的 data.frames 列表。

这是代码：

splitsample <- split(sample, sample$Author)
outputs_rows <- expand.grid(lapply(splitsample, (x) seq_len(nrow(x))))
names_authors <- colnames(outputs_rows)
outputs <- lapply(seq_len(nrow(outputs_rows)),
                  function(row) {
                    df <- data.frame()
                    for (aut in names_authors) {
                      df <- rbind(df, splitsample[[aut]][outputs_rows[row, aut], ])
                    }
                    return(df)
                  })
outputs

结果如下所示：

> outputs
[[1]]
  Author Year Value UniqueName
1      a 2020     3     a 2020
3      b 2020     2     b 2020
5      c 2005     5     c 2005

[[2]]
  Author Year Value UniqueName
2      a 2016     1     a 2016
3      b 2020     2     b 2020
5      c 2005     5     c 2005

[[3]]
  Author Year Value UniqueName
1      a 2020     3     a 2020
4      b 2010     4     b 2010
5      c 2005     5     c 2005

[[4]]
  Author Year Value UniqueName
2      a 2016     1     a 2016
4      b 2010     4     b 2010
5      c 2005     5     c 2005

我希望这对你有帮助。

【讨论】：