【问题标题】：R - generate all possible pairwise combinations of binary vectorsR - 生成二进制向量的所有可能的成对组合
【发布时间】：2018-01-17 02:37:44
【问题描述】：

我正在寻找一种智能方法来生成长度为 n 的两个向量的所有成对组合，其中只有一个值不为零。

现在我正在做一些非常绝望的事情，通过每个组合循环：n

这就是我所追求的，例如 n=3：

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0  

[1,]    1    0    0
[2,]    0    0    1

[1,]    0    1    0
[2,]    1    0    0

[1,]    0    1    0
[2,]    0    0    1

[1,]    0    0    1
[2,]    1    0    0

[1,]    0    0    1
[2,]    0    1    0

非常感谢您的帮助。

【问题讨论】：

标签： r combinations combinatorics binary-data

【解决方案1】：

精明的读者会注意到这个问题可以简化为：“如何生成所有 2 的幂的成对排列？”通过这种方式查看，我们可以避免最初处理二进制向量并将其保存到最后一步。

使用基本R函数intToBits，this answer到问题How to convert integer numbers into binary vector?，以及任何可以生成特定长度排列的函数（有很多包：gtools::permutations，RcppAlgos::permuteGeneral，@ 987654326@, 和arrangements::permutations)，我们可以在一行中得到想要的结果。

library(gtools)
t(sapply(t(gtools::permutations(3, 2, 2^(0:2))),  
         function(x) {as.integer(intToBits(x))})[1:3, ])

      [,1] [,2] [,3]
 [1,]    1    0    0
 [2,]    0    1    0

 [3,]    1    0    0
 [4,]    0    0    1

 [5,]    0    1    0
 [6,]    1    0    0

 [7,]    0    1    0
 [8,]    0    0    1

 [9,]    0    0    1
[10,]    1    0    0

[11,]    0    0    1
[12,]    0    1    0

概括很容易。

bitPairwise <- function(numBits, groupSize) {
    t(sapply(t(gtools::permutations(numBits, groupSize, 2^(0:(numBits-1)))), 
                 function(x) {as.integer(intToBits(x))})[1:numBits, ])
}

 bitPairwise(numBits = 6, groupSize = 3)[1:12, ]
      [,1] [,2] [,3] [,4] [,5] [,6]
 [1,]    1    0    0    0    0    0
 [2,]    0    1    0    0    0    0
 [3,]    0    0    1    0    0    0

 [4,]    1    0    0    0    0    0
 [5,]    0    1    0    0    0    0
 [6,]    0    0    0    1    0    0

 [7,]    1    0    0    0    0    0
 [8,]    0    1    0    0    0    0
 [9,]    0    0    0    0    1    0

[10,]    1    0    0    0    0    0
[11,]    0    1    0    0    0    0
[12,]    0    0    0    0    0    1

更新

我发布此内容只是为了指出@Suren 的答案如何正确。

OP 正在寻找排列而不是组合

从 cmets 中的对话中，您会看到 @Suren 的解决方案在组数增加时没有给出正确的结果（“我还试图获得三个而不是 2 个（或任何数字）的分组" 和 "这正在切断一些解决方案")。

看来@Suren 的答案使用g = 2 给出了正确的结果。之所以如此，是因为1:n choose 2 的排列等于1:n choose 2 与n:1 choose 2 组合的组合（注意1:n 是相反的）。这正是@Suren 的答案正在做的事情（即生成组合选择 2，以相反的顺序编写它们，然后组合）。

## original version
surenFun <- function(n, g) {
    m <- combn(n, g)
    mm <- as.numeric(m)
    mat <- matrix(0, nrow = g * ncol(m), ncol = n)
    mat[ cbind(1:nrow(mat), mm)] <- 1
    soln <- rbind(mat, mat[nrow(mat):1, ])
    split(data.frame(soln), rep(1:(nrow(soln)/g), each=g))
}

## Here is the corrected version
surenFunCorrected <- function(n, g) {
    ## changed combn to gtools::permutations or any other
    ## similar function that can generate permutations
    m <- gtools::permutations(n, g)
    ## you must transpose m
    mm <- as.numeric(t(m))
    ## change ncol(m) to nrow(m)
    mat <- matrix(0, nrow = g * nrow(m), ncol = n)
    mat[ cbind(1:nrow(mat), mm)] <- 1
    ## removed soln
    split(data.frame(mat), rep(1:(nrow(mat)/g), each=g))
}

使用 OP 中的给定示例，它会以不同的顺序给出相同的结果：

## The order is slightly different
match(surenFunCorrected(3, 2), surenFun(3, 2))
[1] 1 2 6 3 5 4

all(surenFunCorrected(3, 2) %in% surenFun(3, 2))
[1] TRUE

all(surenFun(3, 2) %in% surenFunCorrected(3, 2))
[1] TRUE

让我们用g = 3 和n = 4 来测试一下。

## N.B. all of the original output is
## contained in the corrected output
all(surenFun(4, 3) %in% surenFunCorrected(4, 3))
[1] TRUE

## However, there are 16 results
## not returned in the original
leftOut <- which(!(surenFunCorrected(4, 3) %in% surenFun(4, 3)))
leftOut
[1]  3  5  6  7  8  9 11 12 13 14 16 17 18 19 20 22

## E.g. 3 examples that were left out
surenFunCorrected(4, 3)[leftOut[c(1,8,16)]]
$`3`
  X1 X2 X3 X4
7  1  0  0  0
8  0  0  1  0
9  0  1  0  0

$`12`
   X1 X2 X3 X4
34  0  1  0  0
35  0  0  0  1
36  0  0  1  0

$`22`
   X1 X2 X3 X4
64  0  0  0  1
65  0  1  0  0
66  0  0  1  0

【讨论】：

@user971102，这个答案比任何东西都更具学术性。通过在nth 位置显式创建一个全零向量和一个1（而不是创建2 的幂并转换为二进制向量），可以轻松地将其转换为更大数字的可行解决方案。当我有机会时，我会更新这样的解决方案。
是否可以包含 REPLACEABLE 条件？例如，{1,0,0}，{1,0,0} 也是一个有效的解决方案。我们如何包含这种情况？
我得到了答案。我只需要设置 repeats.allowed = TRUE

【解决方案2】：

这样的？

n <- 3
g <- 2 # g must be < n 
m <- combn(n, g)
mm <- as.numeric(m)
mat <- matrix(0, nrow = g * ncol(m), ncol = n)
mat[ cbind(1:nrow(mat), mm)] <- 1

mat
#       [,1] [,2] [,3]
#[1,]    1    0    0
#[2,]    0    1    0

#[3,]    1    0    0
#[4,]    0    0    1

#[5,]    0    1    0
#[6,]    0    0    1

# mat is half the answer :)
# the other half is
mat[nrow(mat):1, ]

#      [,1] [,2] [,3]
#[1,]    0    0    1
#[2,]    0    1    0

#[3,]    0    0    1
#[4,]    1    0    0

#[5,]    0    1    0
#[6,]    1    0    0

soln <- rbind(mat, mat[nrow(mat):1, ])

# as suggested by the OP to split the soln 
d <- split(data.frame(soln), rep(1:(nrow(soln)/g), each=g))

【讨论】：

谢谢大佬，就是这样。然后可以使用以下方法将数据拆分为两个列表：d=split(data.frame(soln),rep(1:(nrow(soln)/2),each=2))。我想知道是否有办法在没有包的情况下在 base R 中做到这一点？
糟糕，combn 函数似乎也在 utils 中。因此，combinat 不是必需的。
我也试图获得三个而不是 2（或任何数字）的分组，只是缺少设置矩阵的最终数字，但几乎就在那里：groups = 3; n
你的意思是像（组）三行，其中每一行作为一个非零元素？
完全正确（或任何预定行数的组），只要 1 在任何行中不重叠，并且组合不重复。您的代码也适用于此，因为现在高估了行数并删除了重复项