如何在 numpy 数组中按行随机分配值答案

【问题标题】：How to randomly assign values row-wise in a numpy array如何在 numpy 数组中按行随机分配值
【发布时间】：2016-12-27 05:52:39
【问题描述】：

我的 google-fu 让我失望了！我有一个 10x10 numpy 数组初始化为 0 如下：

arr2d = np.zeros((10,10))

对于arr2d 中的每一行，我想为1 分配3 个随机列。我可以使用如下循环来做到这一点：

for row in arr2d:
    rand_cols = np.random.randint(0,9,3)
    row[rand_cols] = 1

输出：

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
   [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
   [ 1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.],
   [ 1.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
   [ 0.,  0.,  1.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
   [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.],
   [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

有没有办法利用 numpy 或数组索引/切片以更 Python/优雅的方式（最好是 1 或 2 行代码）实现相同的结果？

【问题讨论】：

您是否注意到其中一行只有两个1s？如果randint(0, 9, 3) 生成具有重复值的样本，就会发生这种情况。这就是你想要的吗？
那么，是否有任何解决方案适合您？

标签： python numpy vectorization

【解决方案1】：

我不确定这在性能方面会有多好，但它相当简洁。

arr2d[:, :3] = 1
map(np.random.shuffle, arr2d)

【讨论】：

为每一行选择相同的列。

【解决方案2】：

使用来自this question 的答案生成不重复的随机数。您可以使用 Python 的 random 模块中的 random.sample 或 np.random.choice。

所以，只需对您的代码稍作修改：

>>> import numpy as np
>>> for row in arr2d:
...     rand_cols = np.random.choice(range(10), 3, replace=False)
...     # Or the python standard lib alternative (use `import random`)
...     # rand_cols = random.sample(range(10), 3)
...     row[rand_cols] = 1
...
>>> arr2d
array([[ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]])

我认为您不能真正利用此处的列切片将值设置为 1，除非您从头开始生成随机数组。这是因为您的列索引对于每一行是随机的。为了便于阅读，最好将其保留为循环形式。

【讨论】：

仅供参考：您可以使用numpy.random.choice(10, size=3, replace=False) 生成非重复随机数。这在您链接的问题的答案之一中有所描述。
@WarrenWeckesser 我确实注意到了，但我没有包括它，因为它是第二个结果。我会添加它作为替代。谢谢！
事实上，回想起来，最好只使用np.random 以避免有两个非常相似的导入，这可能会让人很困惑。

【解决方案3】：

用arr2d = np.zeros((10,10)) 初始化arr2d 后，您可以像这样使用带有two-liner 的矢量化方法 -

# Generate random unique 3 column indices for 10 rows
idx = np.random.rand(10,10).argsort(1)[:,:3]

# Assign them into initialized array
arr2d[np.arange(10)[:,None],idx] = 1

如果你喜欢这样的话，或者把所有东西都抽筋来做单线 -

arr2d[np.arange(10)[:,None],np.random.rand(10,10).argsort(1)[:,:3]] = 1

示例运行 -

In [11]: arr2d = np.zeros((10,10))  # Initialize array

In [12]: idx = np.random.rand(10,10).argsort(1)[:,:3]

In [13]: arr2d[np.arange(10)[:,None],idx] = 1

In [14]: arr2d # Verify by manual inspection
Out[14]: 
array([[ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  1.],
       [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.]])

In [15]: arr2d.sum(1) # Verify by counting ones in each row
Out[15]: array([ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.])

注意：如果您正在寻找性能，我建议您使用this other post 中列出的基于np.argpartition 的方法。

【讨论】：

真的很酷。这两行代码中包含了很多巧妙之处。
@zarak 最初的想法来自这篇文章 - stackoverflow.com/a/29156976/3293881。此处列出了针对循环方法的加速：stackoverflow.com/a/31958263/3293881