【发布时间】:2012-01-19 03:41:47
【问题描述】:
假设我有一个 NumPy 数组,a:
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
我想添加一列零来获得一个数组,b:
b = np.array([
[1, 2, 3, 0],
[2, 3, 4, 0]
])
如何在 NumPy 中轻松做到这一点?
【问题讨论】:
假设我有一个 NumPy 数组,a:
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
我想添加一列零来获得一个数组,b:
b = np.array([
[1, 2, 3, 0],
[2, 3, 4, 0]
])
如何在 NumPy 中轻松做到这一点?
【问题讨论】:
np.r_[ ... ] 和 np.c_[ ... ]
是vstack 和hstack 的有用替代品,
用方括号 [] 代替圆 ()。
举几个例子:
: import numpy as np
: N = 3
: A = np.eye(N)
: np.c_[ A, np.ones(N) ] # add a column
array([[ 1., 0., 0., 1.],
[ 0., 1., 0., 1.],
[ 0., 0., 1., 1.]])
: np.c_[ np.ones(N), A, np.ones(N) ] # or two
array([[ 1., 1., 0., 0., 1.],
[ 1., 0., 1., 0., 1.],
[ 1., 0., 0., 1., 1.]])
: np.r_[ A, [A[1]] ] # add a row
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.]])
: # not np.r_[ A, A[1] ]
: np.r_[ A[0], 1, 2, 3, A[1] ] # mix vecs and scalars
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], [1, 2, 3], A[1] ] # lists
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], (1, 2, 3), A[1] ] # tuples
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
: np.r_[ A[0], 1:4, A[1] ] # same, 1:4 == arange(1,4) == 1,2,3
array([ 1., 0., 0., 1., 2., 3., 0., 1., 0.])
(方括号[]而不是圆()的原因 是 Python 扩展了吗? 1:4 方形 -- 超载的奇迹。)
【讨论】:
np.c_[ * iterable ];见expression-lists。
我认为一个更直接且启动速度更快的解决方案是执行以下操作:
import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a
时间安排:
In [23]: N = 10
In [24]: a = np.random.rand(N,N)
In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop
In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop
【讨论】:
a = np.random.rand((N,N)) 更改为a = np.random.rand(N,N)
使用numpy.append:
>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
[0]])
>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])
【讨论】:
append 实际上只是调用concatenate
一种方法,使用hstack,是:
b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))
【讨论】:
dtype参数,不需要,甚至不允许。虽然您的解决方案足够优雅,但如果您需要经常“追加”到数组,请注意不要使用它。如果您不能一次创建整个数组并稍后填充它,请创建一个数组列表和hstack 一次。
我也对这个问题感兴趣,比较了速度
numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T
对于任何输入向量a,它们都做同样的事情。成长的时机a:
请注意,所有非连续变体(尤其是stack/vstack)最终都比所有连续变体更快。如果您需要连续性,column_stack(因为它的清晰度和速度)似乎是一个不错的选择。
重现情节的代码:
import numpy as np
import perfplot
b = perfplot.bench(
setup=np.random.rand,
kernels=[
lambda a: np.c_[a, a],
lambda a: np.ascontiguousarray(np.stack([a, a]).T),
lambda a: np.ascontiguousarray(np.vstack([a, a]).T),
lambda a: np.column_stack([a, a]),
lambda a: np.concatenate([a[:, None], a[:, None]], axis=1),
lambda a: np.ascontiguousarray(np.concatenate([a[None], a[None]], axis=0).T),
lambda a: np.stack([a, a]).T,
lambda a: np.vstack([a, a]).T,
lambda a: np.concatenate([a[None], a[None]], axis=0).T,
],
labels=[
"c_",
"ascont(stack)",
"ascont(vstack)",
"column_stack",
"concat",
"ascont(concat)",
"stack (non-cont)",
"vstack (non-cont)",
"concat (non-cont)",
],
n_range=[2 ** k for k in range(23)],
xlabel="len(a)",
)
b.save("out.png")
【讨论】:
stack、hstack、vstack、column_stack、dstack 都是建立在 np.concatenate 之上的辅助函数。通过跟踪definition of stack,我发现np.stack([a,a]) 正在调用np.concatenate([a[None], a[None]], axis=0)。将 np.concatenate([a[None], a[None]], axis=0).T 添加到 perfplot 中可能会很好,以表明 np.concatenate 始终可以至少与其辅助函数一样快。
c_ 和 column_stack
我觉得以下最优雅:
b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3
insert 的一个优点是它还允许您在数组内的其他位置插入列(或行)。此外,您可以轻松插入整个向量,而不是插入单个值,例如复制最后一列:
b = np.insert(a, insert_index, values=a[:,2], axis=1)
这导致:
array([[1, 2, 3, 3],
[2, 3, 4, 4]])
对于时间安排,insert 可能比 JoshAdel 的解决方案慢:
In [1]: N = 10
In [2]: a = np.random.rand(N,N)
In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop
In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop
In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop
【讨论】:
insert(a, -1, ...) 来追加列。我想我会在前面加上它。
a.shape[axis] 获取该轴的大小来追加行或列。 IE。追加一行,你做np.insert(a, a.shape[0], 999, axis=0) 和一个列,你做np.insert(a, a.shape[1], 999, axis=1)。
我认为:
np.column_stack((a, zeros(shape(a)[0])))
更优雅。
【讨论】:
假设 M 是 (100,3) ndarray 并且 y 是 (100,) ndarray append 可以如下使用:
M=numpy.append(M,y[:,None],1)
诀窍是使用
y[:, None]
这会将 y 转换为 (100, 1) 二维数组。
M.shape
现在给
(100, 4)
【讨论】:
np.concatenate 也可以使用
>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
[ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1., 2., 3., 0.],
[ 2., 3., 4., 0.]])
【讨论】:
np.concatenate 似乎比 np.hstack 快 3 倍。在我的实验中,np.concatenate 也比手动将矩阵复制到空矩阵中要快得多。这与下面 Nico Schlömer 的回答一致。
Numpy 的np.append 方法接受三个参数,前两个是 2D numpy 数组,第三个是轴参数,指示沿哪个轴追加:
import numpy as np
x = np.array([[1,2,3], [4,5,6]])
print("Original x:")
print(x)
y = np.array([[1], [1]])
print("Original y:")
print(y)
print("x appended to y on axis of 1:")
print(np.append(x, y, axis=1))
打印:
Original x:
[[1 2 3]
[4 5 6]]
Original y:
[[1]
[1]]
y appended to x on axis of 1:
[[1 2 3 1]
[4 5 6 1]]
【讨论】:
我喜欢 JoshAdel 的回答,因为它注重性能。一个小的性能改进是避免用零初始化的开销,只是被覆盖。当 N 很大时,这具有可测量的差异,使用空而不是零,并且零列作为单独的步骤编写:
In [1]: import numpy as np
In [2]: N = 10000
In [3]: a = np.ones((N,N))
In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop
In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop
【讨论】:
b[:,-1] = 0。此外,对于非常大的数组,与np.insert() 的性能差异可以忽略不计,这可能会使np.insert() 更简洁,因为它更受欢迎。
np.insert 也可以达到目的。
matA = np.array([[1,2,3],
[2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)
array([[1, 2, 3, 0],
[2, 3, 4, 0]])
它沿一个轴在给定索引之前插入值,此处为 new_col,此处为 idx。换句话说,新插入的值将占据idx 列,并将原来在idx 和之后的内容向后移动。
【讨论】:
insert 不存在,因为人们可以假设给定函数的名称(请参阅答案中链接的文档)。
聚会有点晚了,但还没有人发布这个答案,所以为了完整起见:你可以用列表推导在一个普通的 Python 数组上做到这一点:
source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)
【讨论】:
对我来说,下一个方法看起来非常直观和简单。
zeros = np.zeros((2,1)) #2 is a number of rows in your array.
b = np.hstack((a, zeros))
【讨论】:
在我的例子中,我必须在 NumPy 数组中添加一列 1
X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)
之后 X.shape => (97, 2)
array([[ 1. , 6.1101],
[ 1. , 5.5277],
...
【讨论】:
有一个专门用于此的功能。它被称为 numpy.pad
a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
[2, 3, 4, 0]])
这是它在文档字符串中所说的:
Pads an array.
Parameters
----------
array : array_like of rank N
Input array
pad_width : {sequence, array_like, int}
Number of values padded to the edges of each axis.
((before_1, after_1), ... (before_N, after_N)) unique pad widths
for each axis.
((before, after),) yields same before and after pad for each axis.
(pad,) or int is a shortcut for before = after = pad width for all
axes.
mode : str or function
One of the following string values or a user supplied function.
'constant'
Pads with a constant value.
'edge'
Pads with the edge values of array.
'linear_ramp'
Pads with the linear ramp between end_value and the
array edge value.
'maximum'
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
<function>
Padding function, see Notes.
stat_length : sequence or int, optional
Used in 'maximum', 'mean', 'median', and 'minimum'. Number of
values at edge of each axis used to calculate the statistic value.
((before_1, after_1), ... (before_N, after_N)) unique statistic
lengths for each axis.
((before, after),) yields same before and after statistic lengths
for each axis.
(stat_length,) or int is a shortcut for before = after = statistic
length for all axes.
Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
Used in 'constant'. The values to set the padded values for each
axis.
((before_1, after_1), ... (before_N, after_N)) unique pad constants
for each axis.
((before, after),) yields same before and after constants for each
axis.
(constant,) or int is a shortcut for before = after = constant for
all axes.
Default is 0.
end_values : sequence or int, optional
Used in 'linear_ramp'. The values used for the ending value of the
linear_ramp and that will form the edge of the padded array.
((before_1, after_1), ... (before_N, after_N)) unique end values
for each axis.
((before, after),) yields same before and after end values for each
axis.
(constant,) or int is a shortcut for before = after = end value for
all axes.
Default is 0.
reflect_type : {'even', 'odd'}, optional
Used in 'reflect', and 'symmetric'. The 'even' style is the
default with an unaltered reflection around the edge value. For
the 'odd' style, the extented part of the array is created by
subtracting the reflected values from two times the edge value.
Returns
-------
pad : ndarray
Padded array of rank equal to `array` with shape increased
according to `pad_width`.
Notes
-----
.. versionadded:: 1.7.0
For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes. This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.
The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::
padding_func(vector, iaxis_pad_width, iaxis, kwargs)
where
vector : ndarray
A rank 1 array already padded with zeros. Padded values are
vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
iaxis_pad_width : tuple
A 2-tuple of ints, iaxis_pad_width[0] represents the number of
values padded at the beginning of vector where
iaxis_pad_width[1] represents the number of values padded at
the end of vector.
iaxis : int
The axis currently being calculated.
kwargs : dict
Any keyword arguments the function requires.
Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])
>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])
>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5, 3, 1, 2, 3, 4, 5, 2, -1, -4])
>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])
>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])
>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])
>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1],
[3, 3, 3, 4, 3, 3, 3],
[1, 1, 1, 2, 1, 1, 1],
[1, 1, 1, 2, 1, 1, 1]])
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])
>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])
>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])
>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])
>>> def pad_with(vector, pad_width, iaxis, kwargs):
... pad_value = kwargs.get('padder', 10)
... vector[:pad_width[0]] = pad_value
... vector[-pad_width[1]:] = pad_value
... return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10, 0, 1, 2, 10, 10],
[10, 10, 3, 4, 5, 10, 10],
[10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100, 0, 1, 2, 100, 100],
[100, 100, 3, 4, 5, 100, 100],
[100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100]])
【讨论】:
我喜欢这个:
new_column = np.zeros((len(a), 1))
b = np.block([a, new_column])
【讨论】: