Python / numpy中的n维网格答案

【问题标题】：n dimensional grid in Python / numpyPython / numpy中的n维网格
【发布时间】：2018-03-15 10:11:21
【问题描述】：

我有一个未知数量的变量n，其范围从 0 到 1，有一些已知的步骤 s，条件是它们总和为 1。我想创建一个所有组合的矩阵。例如，如果n=3 和s=0.33333 则网格将是（顺序不重要）：

0.00, 0.00, 1.00
0.00, 0.33, 0.67
0.00, 0.67, 0.33
0.00, 1.00, 0.00
0.33, 0.00, 0.67
0.33, 0.33, 0.33
0.33, 0.67, 0.00
0.67, 0.00, 0.33
0.67, 0.33, 0.00
1.00, 0.00, 0.00

我怎样才能为任意n 做到这一点？

【问题讨论】：

s = 1/n 总是正确的还是只是在示例中？
@jdehesa：只是一个例子

标签： python numpy

【解决方案1】：

这是使用itertools.combinations的直接方法：

>>> import itertools as it
>>> import numpy as np
>>> 
>>> # k is 1/s
>>> n, k = 3, 3
>>> 
>>> combs = np.array((*it.combinations(range(n+k-1), n-1),), int)
>>> (np.diff(np.c_[np.full((len(combs),), -1), combs, np.full((len(combs),), n+k-1)]) - 1) / k
array([[0.        , 0.        , 1.        ],
       [0.        , 0.33333333, 0.66666667],
       [0.        , 0.66666667, 0.33333333],
       [0.        , 1.        , 0.        ],
       [0.33333333, 0.        , 0.66666667],
       [0.33333333, 0.33333333, 0.33333333],
       [0.33333333, 0.66666667, 0.        ],
       [0.66666667, 0.        , 0.33333333],
       [0.66666667, 0.33333333, 0.        ],
       [1.        , 0.        , 0.        ]])

如果担心速度，itertools.combinations 可以替换为numpy implementation。

【讨论】：

这个解决方案是相当快的，但结合你的 NumPy 组合算法我认为它是最快的，特别是对于 n 和 k 的值。

【解决方案2】：

编辑

这里有一个更好的解决方案。它基本上partitions将步数转化为变量的数量来生成所有的有效组合：

def partitions(n, k):
    if n < 0:
        return -partitions(-n, k)
    if k <= 0:
        raise ValueError('Number of partitions must be positive')
    if k == 1:
        return np.array([[n]])
    ranges = np.array([np.arange(i + 1) for i in range(n + 1)])
    parts = ranges[-1].reshape((-1, 1))
    s = ranges[-1]
    for _ in range(1, k - 1):
        d = n - s
        new_col = np.concatenate(ranges[d])
        parts = np.repeat(parts, d + 1, axis=0)
        s = np.repeat(s, d + 1) + new_col
        parts = np.append(parts, new_col.reshape((-1, 1)), axis=1)
    return np.append(parts, (n - s).reshape((-1, 1)), axis=1)

def make_grid_part(n, step):
    num_steps = round(1.0 / step)
    return partitions(num_steps, n) / float(num_steps)

print(make_grid_part(3, 0.33333))

输出：

array([[ 0.        ,  0.        ,  1.        ],
       [ 0.        ,  0.33333333,  0.66666667],
       [ 0.        ,  0.66666667,  0.33333333],
       [ 0.        ,  1.        ,  0.        ],
       [ 0.33333333,  0.        ,  0.66666667],
       [ 0.33333333,  0.33333333,  0.33333333],
       [ 0.33333333,  0.66666667,  0.        ],
       [ 0.66666667,  0.        ,  0.33333333],
       [ 0.66666667,  0.33333333,  0.        ],
       [ 1.        ,  0.        ,  0.        ]])

比较：

%timeit make_grid_part(5, .1)
>>> 338 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_grid_simple(5, .1)
>>> 26.4 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

make_grid_simple 如果再往前推一点，实际上会耗尽内存。

这是一种简单的方法：

def make_grid_simple(n, step):
    num_steps = round(1.0 / step)
    vs = np.meshgrid(*([np.linspace(0, 1, num_steps + 1)] * n))
    all_combs = np.stack([v.flatten() for v in vs], axis=1)
    return all_combs[np.isclose(all_combs.sum(axis=1), 1)]

print(make_grid_simple(3, 0.33333))

输出：

[[ 0.          0.          1.        ]
 [ 0.33333333  0.          0.66666667]
 [ 0.66666667  0.          0.33333333]
 [ 1.          0.          0.        ]
 [ 0.          0.33333333  0.66666667]
 [ 0.33333333  0.33333333  0.33333333]
 [ 0.66666667  0.33333333  0.        ]
 [ 0.          0.66666667  0.33333333]
 [ 0.33333333  0.66666667  0.        ]
 [ 0.          1.          0.        ]]

但是，这并不是最有效的方法，因为它只是简单地进行所有可能的组合，然后只选择加起来为 1 的组合，而不是一开始就只生成正确的组合。对于小步长，可能会导致内存成本过高。

【讨论】：

根据问题每行的总和应该是1。
@AnkurAnkan 啊对了，我没看错，谢谢，我改一下……
make_grid_part 似乎有问题。试试make_grid_part(4, 0.5)，看看结果的下半部分。

【解决方案3】：

假设它们总是加起来为 1，如你所说：

import itertools

def make_grid(n):   
  # setup all possible values in one position
  p = [(float(1)/n)*i for i in range(n+1)]

  # combine values, filter by sum()==1
  return [x for x in itertools.product(p, repeat=n) if sum(x) == 1]

print(make_grid(n=3))

#[(0.0, 0.0, 1.0),
# (0.0, 0.3333333333333333, 0.6666666666666666),
# (0.0, 0.6666666666666666, 0.3333333333333333),
# (0.0, 1.0, 0.0),
# (0.3333333333333333, 0.0, 0.6666666666666666),
# (0.3333333333333333, 0.3333333333333333, 0.3333333333333333),
# (0.3333333333333333, 0.6666666666666666, 0.0),
# (0.6666666666666666, 0.0, 0.3333333333333333),
# (0.6666666666666666, 0.3333333333333333, 0.0),
# (1.0, 0.0, 0.0)]

【讨论】：

【解决方案4】：

我们可以认为这是一个问题，将一些固定数量的事物（在这种情况下为 1/s，并使用sum_left 参数表示）在给定数量的 bin 之间（在这种情况下为 n）。我能想到的最有效的方法是使用递归：

In [31]: arr = []
In [32]: def fun(n, sum_left, arr_till_now):
    ...:     if n==1:
    ...:         n_arr = list(arr_till_now)
    ...:         n_arr.append(sum_left)
    ...:         arr.append(n_arr)
    ...:     else:
    ...:         for i in range(sum_left+1):
    ...:             n_arr = list(arr_till_now)
    ...:             n_arr.append(i)
    ...:             fun(n-1, sum_left-i, n_arr)

这将给出如下输出：

In [36]: fun(n, n, [])
In [37]: arr
Out[37]: 
[[0, 0, 3],
 [0, 1, 2],
 [0, 2, 1],
 [0, 3, 0],
 [1, 0, 2],
 [1, 1, 1],
 [1, 2, 0],
 [2, 0, 1],
 [2, 1, 0],
 [3, 0, 0]]

现在我可以将其转换为 numpy 数组以进行元素乘法：

In [39]: s = 0.33
In [40]: arr_np = np.array(arr)
In [41]: arr_np * s
Out[41]: 
array([[ 0.        ,  0.        ,  0.99999999],
       [ 0.        ,  0.33333333,  0.66666666],
       [ 0.        ,  0.66666666,  0.33333333],
       [ 0.        ,  0.99999999,  0.        ],
       [ 0.33333333,  0.        ,  0.66666666],
       [ 0.33333333,  0.33333333,  0.33333333],
       [ 0.33333333,  0.66666666,  0.        ],
       [ 0.66666666,  0.        ,  0.33333333],
       [ 0.66666666,  0.33333333,  0.        ],
       [ 0.99999999,  0.        ,  0.        ]])

【讨论】：

【解决方案5】：

此方法也适用于任意总和 (total)：

import numpy as np
import itertools as it
import scipy.special

n = 3
s = 1/3.
total = 1.00

interval = int(total/s)
n_combs = scipy.special.comb(n+interval-1, interval, exact=True)
counts = np.zeros((n_combs, n), dtype=int)

def count_elements(elements, n):
    count = np.zeros(n, dtype=int)
    for elem in elements:
        count[elem] += 1
    return count

for i, comb in enumerate(it.combinations_with_replacement(range(n), interval)):
    counts[i] = count_elements(comb, n)

ratios = counts*s
print(ratios)

【讨论】：