找到两个不相交的对，它们总和为相同的向量答案

【问题标题】：Find two disjoint pairs of pairs that sum to the same vector找到两个不相交的对，它们总和为相同的向量
【发布时间】：2014-01-15 14:05:21
【问题描述】：

这是Find two pairs of pairs that sum to the same value 的后续。

我有我使用的随机二维数组

import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(m,n))

我想确定矩阵是否有两对不相交的列对，它们总和为相同的列向量。我正在寻找一种快速的方法来做到这一点。在前面的问题中 ((0,1), (0,2)) 作为一对列索引是可以接受的，但在这种情况下它不是因为 0 在两对中。

上一个问题的已接受答案是如此巧妙地优化，不幸的是，我看不出如何进行这种简单的外观更改。（我对这个问题中的列而不是行感兴趣，但我总是可以只做 A.transpose()。）

这里有一些代码显示它测试所有 4 x 4 数组。

n = 4
nxn = np.arange(n*n).reshape(n, -1)
count = 0
for i in xrange(2**(n*n)):
   A = (i >> nxn) %2
   p = 1
   for firstpair in combinations(range(n), 2):
       for secondpair in combinations(range(n), 2):
           if firstpair < secondpair and not set(firstpair) & set(secondpair):
              if (np.array_equal(A[firstpair[0]] + A[firstpair[1]], A[secondpair[0]] + A[secondpair[1]] )):
                  if (p):
                      count +=1
                      p = 0
print count

这应该输出 3136。

【问题讨论】：

我的解决方案有什么问题？

标签： python numpy

【解决方案1】：

这是我的解决方案，可以扩展为做我认为你想做的事。但这并不完全清楚。可以得到任意数量的行对总和相同；其中可能存在总和相同的行的唯一子集。例如：

鉴于这组总和相同的行对

[[19 19 30 30]
 [11 16 11 16]]

这些行中存在一个唯一的子集，仍可被视为有效；但应该吗？

[[19 30]
 [16 11]]

无论如何，我希望这些细节很容易处理，给定下面的代码。

import numpy as np

n = 20
#also works for non-square A
A = np.random.randint(2, size=(n*6,n)).astype(np.int8)
##A = np.array( [[0, 0, 0], [1, 1, 1], [1, 1 ,1]], np.uint8)
##A = np.zeros((6,6))
#force the inclusion of some hits, to keep our algorithm on its toes
##A[0] = A[1]


def base_pack_lazy(a, base, dtype=np.uint64):
    """
    pack the last axis of an array as minimal base representation
    lazily yields packed columns of the original matrix
    """
    a = np.ascontiguousarray( np.rollaxis(a, -1))
    packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2))
    for columns in np.array_split(a, (len(a)-1)//packing+1):
        R = np.zeros(a.shape[1:], dtype)
        for col in columns:
            R *= base
            R += col
        yield R

def unique_count(a):
    """returns counts of unique elements"""
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)        #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise!
    return unique, count, inverse

def voidview(arr):
    """view the last axis of an array as a void object. can be used as a faster form of lexsort"""
    return np.ascontiguousarray(arr).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1]))).reshape(arr.shape[:-1])


def has_identical_row_sums_lazy(A, combinations_index):
    """
    compute the existence of combinations of rows summing to the same vector,
    given an nxm matrix A and an index matrix specifying all combinations

    naively, we need to compute the sum of each row combination at least once, giving n^3 computations
    however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity
    all nicely vectorized of course
    """

    multiplicity, combinations = combinations_index.shape
    #list of indices into combinations_index, denoting possibly interacting combinations
    active_combinations = np.arange(combinations, dtype=np.uint32)
    #keep all packed columns; we might need them later
    columns = []

    for packed_column in base_pack_lazy(A, base=multiplicity+1):       #loop over packed cols
        columns.append(packed_column)
        #compute rowsums only for a fixed number of columns at a time.
        #this is O(n^2) rather than O(n^3), and after considering the first column,
        #we can typically already exclude almost all combinations
        partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index)
        #find duplicates in this column
        unique, count, inverse = unique_count(partial_rowsums)
        #prune those combinations which we can exclude as having different sums, based on columns inspected thus far
        active_combinations = active_combinations[count[inverse] > 1]
        #early exit; no pairs
        if len(active_combinations)==0:
            return False

    """
    we now have a small set of relevant combinations, but we have lost the details of their particulars
    to see which combinations of rows does sum to the same value, we do need to consider rows as a whole
    we can simply apply the same mechanism, but for all columns at the same time,
    but only for the selected subset of row combinations known to be relevant
    """
    #construct full packed matrix
    B = np.ascontiguousarray(np.vstack(columns).T)
    #perform all relevant sums, over all columns
    rowsums = sum(B[I[active_combinations]] for I in combinations_index)
    #find the unique rowsums, by viewing rows as a void object
    unique, count, inverse = unique_count(voidview(rowsums))
    #if not, we did something wrong in deciding on active combinations
    assert(np.all(count>1))

    #loop over all sets of rows that sum to an identical unique value
    for i in xrange(len(unique)):
        #set of indexes into combinations_index;
        #note that there may be more than two combinations that sum to the same value; we grab them all here
        combinations_group = active_combinations[inverse==i]
        #associated row-combinations
        #array of shape=(mulitplicity,group_size)
        row_combinations = combinations_index[:,combinations_group]

        #if no duplicate rows involved, we have a match
        if len(np.unique(row_combinations[:,[0,-1]])) == multiplicity*2:
            print row_combinations
            return True

    #none of identical rowsums met uniqueness criteria
    return False


def has_identical_triple_row_sums(A):
    n = len(A)
    idx = np.array( [(i,j,k)
        for i in xrange(n)
            for j in xrange(n)
                for k in xrange(n)
                    if i<j and j<k], dtype=np.uint16)
    idx = np.ascontiguousarray( idx.T)
    return has_identical_row_sums_lazy(A, idx)

def has_identical_double_row_sums(A):
    n = len(A)
    idx = np.array(np.tril_indices(n,-1), dtype=np.int32)
    return has_identical_row_sums_lazy(A, idx)


from time import clock
t = clock()
for i in xrange(1):
##    print has_identical_double_row_sums(A)
    print has_identical_triple_row_sums(A)
print clock()-t

编辑：代码清理

【讨论】：

在 2 x 4 数组 [[19 19 30 30], [11 16 11 16]] 的情况下，第 0 列和第 3 列的总和与第 1 列和第 2 列相同。它不考虑这个问题的行是有意义的，因为我的问题的定义至少需要 4 行。
我不明白你的评论。实际上，4 列中每一列中所有索引的总和给出了相同的总数。然而，在我看来，2x2 子集 [[19 30] [16 11]] 符合您感兴趣的组合的定义，不是吗？
对不起，一定是误会了。我们需要找到两对不相交的列对。假设我们在一组中找到第 0 列和第 1 列，在另一组中找到第 2 列和第 3 列。然后我们对第 0 列和第 1 列进行向量求和，在这种情况下得到向量 [39,27]。然后我们对第 2 列和第 3 列执行相同的操作，得到 [60,27] 并发现 [39,27] != [60,27]。但是，如果我们选择了第 0 列和第 3 列以及第 1 列和第 2 列，我们将得到两个向量的 [49,27]，这正是我们想要找到的。如果只有 2 列，我们就不能得到两个大小为 2 的不相交的子集。
我写的表包含了A的索引。假设A在这段代码中只有0和1的值；所以向量和只能是[2,1,0]，而不是[39,27]。事实上，我们之间似乎存在很大的误解。您是否尝试过运行代码并查看输出？
如你所说，它应该只有 0 和 1。我只是在使用您在回答中给出的示例，但我发现我完全误解了其中的内容！让我运行你的代码看看。